 Campus health and safety are our top priorities. Get the latest from UT on COVID-19.

### Packages for the Macintosh 2

#### Question:

I would like information on the statistical packages available for the Macintosh operating system.

Although there are many statistical packages available for the Macintosh operating system, we supply and support only SPSS. Software Distribution Services provides a listing of all software currently distributed, as well as specific information concerning these statistical packages, at http://www.utexas.edu/its/products.

The following statistical software packages are available for the Macintosh operating system as of 2012 and may be bought directly from the vendor:

EQS 6 for Mac - http://www.mvsoft.com/products.htm
JMP 9.0 for Mac - http://www.jmp.com
R - http://www.r-project.org/
SAS 6.12 - http://www.sas.com/contact/ OR 1-800-727-0025
SPSS - http://www.spss.com/
Stata - http://www.stata.com/

The following packages are not available for the Macintosh as of 2012:

HLM - http://www.ssicentral.com/
LISREL - http://www.ssicentral.com/lisrel/
Minitab - http://www.minitab.com/
MPlus - http://www.statmodel.com/
SAS versions later than 6.12 - http://www.sas.com/
SUDAAN - http://www.rti.org/sudaan/
S-Plus - http://www.insightful.com/products/splus/

### Relationship between F and R-square

#### Question:

How can I express R-square in terms of F?

R^2 = df1*F / (df1*F + df2) where F is distributed as F(df1,df2).

To see this, let SST be the total (corrected) sum of squares, let SSR be the sum of squares from the regression model (which must contain df1 predictors in addition to the mean), and let the error sum of squares be SSE = SST - SSR. Then R^2 = SSR / SST and F = (SSR/df1) / (SSE/df2), and the stated relationship can be obtained with a little algebra.

Similarly, F = (df2/df1) * R^2 / (1-R^2).

### Data codebooks

#### Question:

What is a data codebook and why would I want to use one?

A codebook is a key that defines how data will be entered into a computer file. It is also useful later, when you want to analyze the data and need to tell a software package how to read the data file.

There are at least eight things to be concerned with:
1. What will the variable names be on the computer? These must usually be limited to 8 characters, the first being a letter.
2. What are the labels, if any, to be associated with each variable? These clarify variable names that are often too brief to be understandable.
3. Does the variable contain only numeric values (does it contain any character values)?
4. Does the variable contain any missing values? How will these be coded?
5. What labels, if any, will be assigned to values to clarify what those values represent (e.g., 1='male', 2='female')?
6. What are the maximum number of columns needed to accurately represent a variable, including decimal places and negative signs?
7. What is the field or column location to be assigned to each variable?
8. Is more than one row (usually 80 columns) of data required per case?

An Example of a Codebook for a Simple Data File

Imagine that a 20-question survey is given to individuals regarding insurance attitudes. The codebook book should contain the following information about how the survey data was entered into a text file.

 Variable Width Columns Variable Label Value Labels ssn 9 1-9 soc sec number All fields: 9=missing age 3 11-13 subject age sex 1 15 subject sex 1=male, 2=female quest1 1 20 has insurance (All questions: quest20 1 60 wants more 1=no, 2=yes)

Having completed a codebook, you're ready to enter data into a computer file using this format. The actual data file would then look something like this:
 451335322 29 1 1 9 1 1 2 2 2 1 2 1 2 2 1 2 9 1 1 2 1 1 354009564 67 2 1 2 1 1 1 2 2 2 2 1 2 1 1 1 1 2 1 1 2 2

### DF for a correlation test (H0: rho=0)

#### Question:

Why does the test of a correlation (Ho: rho = 0) have N-2 degrees of freedom instead of N-1 degrees of freedom? There's only one correlation being estimated instead of two, so why are two degrees of freedom used?

Remember that estimating the correlation coefficient is a special case of using the simple linear regression model. This regression model takes the form:

y = a + bx + e

where a (the intercept), and b (the slope) are the two parameters in the model to be estimated. Since two values are being estimated, two degrees of freedom are lost.

### Point biserial correlation

#### Question:

I need to compute point biserial correlations for some data. However, I cannot find a procedure in any of the major stats packages that does this.

The point biserial correlation is just the Pearson correlation with one of the variables being dichotomous. A special formula exists, but its purpose is to ease the burden of those who have to do the calculations by hand.

So, on a computer, just use the Pearson correlation procedure:

In SAS, use: PROC CORR.

In SPSS, use: CORRELATIONS.

### Estimation methods in structural equation modeling

#### Question:

What are the advantages and disadvantages of using a maximum likelihood estimation method vs. a least squares estimation method in structural equation modeling?

Monte Carlo simulation studies have shown that under ideal sampling conditions the three most common estimation methods (maximum likelihood, generalized least squares, and ordinary least squares) all yield comparable and very good parameter estimates.

However, under less-than-ideal sampling conditions, each method has its own strengths and weaknesses. For example, when the assumption of joint multivariate normality is violated, maximum likelihood estimation tends to yield nonoptimal solutions, especially when the sample size falls below N = 200.

In general, for effective structural equation modeling, the total sample size should be at least 200, and at least three manifest variables should be included for each latent variable.

Each less-than-ideal sampling situation presents a unique set of difficulties. You may want to contact a consultant by email (click HERE for more info) if you believe that your sample is less than ideal. Also, Latent Variable Models, by J. C. Loehlin, 1987, pp. 54-60, contains more information on this topic.

### Finite population correction factor

#### Question:

I'm sampling from a finite population. I've heard that in such cases the usual variance estimate can be too large. Is there some sort of correction factor?

If the sample size, n, is greater than 5% of the population size, N, you will benefit by using the finite population correction factor. For the variance adjustment, multiply the original variance value by (N-n)/N.

For additional information, see Sampling Techniques by W.G Cochran.

### Comments/Codebook in an external data file

#### Question:

I have an external data file that I would like to read into a statistical software package, preferably SAS or SPSS. I've included a codebook at the top of the data file. How can I tell SAS or SPSS to start reading the data after skipping the first n lines of the data file?

SPSS can perform this task with either an Excel or text file.

For Excel, open the external data file. Uncheck the box that says “Read variable names from the first row of data.” In the box labeled “Range”, specify the first and last cell of the Excel spreadsheet to be read.

For example, A5:H35 tells SPSS to begin reading data in the first column, fifth row, continuing to read data by row until reaching the cell in the eighth column, thirty-fifth row. This will eliminate the first four rows from the SPSS dataset.

Unfortunately, variable names cannot be read into SPSS using this method; they must be manually entered in the SPSS dataset.

For a text file, open the external data file. A Text Import Wizard dialog box will appear. Follow the prompts as explained below:

Step 1 – No action is necessary; click next.

Step 2 – Specify the delimiter(s) that separate the data into columns and indicate if variable names are in the first row of data. If variable names are in the top row of the text file, SPSS will use these names in the new dataset and still allow you to begin reading data from a specified line.

Step 3 - Designate the line number corresponding to the first case. The “Data preview” box at the bottom of the dialog box shows the first few lines of the dataset as a check that SPSS is reading the data correctly.

Step 4 – Indicate how the data are separated.

Step 5 – Name and format variables.

Step 6 – The file and syntax can be saved.

In SAS, use the FIRSTOBS= option in the INFILE statement. This option tells SAS which line of the infile to start reading data from.

For example, use the following syntax to begin reading data on line 21 of the external data file RAW.DAT located in the TEMP subdirectory of your C: disk drive:

INFILE 'c:\temp\raw.dat' FIRSTOBS = 21 ;

For more information on the infile statement in SAS, use the online SAS manual at http://support.sas.com/documentation/onlinedoc/base/index.html. Go to the SAS OnlineDoc under Base SAS 9.1.3 Procedures Guide and click the Index tab. You can then search for infile.

### Standard error of the measurement

#### Question:

What is the standard error of the measurement?

The standard error is the standard deviation of the sampling distribution of a statistic.

For example, suppose you are estimating the mean height of the population of eastern white pines. You select a sample of 100 trees, measure their height, and calculate a mean. Any given sample mean will be a function of the population mean, AND the random unique characteristics of the individual trees in the sample. Thus, if I were to take another sample of 100 trees, that mean would be a little different, and so would the mean of a third sample, and so on. If I calculated means for a very large number of samples of the same size, this sample of sample means would themselves have a mean value and a standard deviation. The mean of this "sampling distribution" would be the population mean, and the standard deviation is the standard error of the measurement. In this case the measurement is the mean, but it can be any sample statistic. The standard error tells us how much we can expect any given sample statistic to deviate from the population parameter we are estimating.

Just like a sample standard deviation from our tree example above tells us how much we can expect each tree to deviate from the mean of its sample, the standard error tells us how much we can expect any given statistic to deviate from its sampling mean, and remember, the mean of the sampling distribution is the actual population parameter value. The standard errror thus allows us to create confidence intervals and test hypotheses at a specified level of uncertainty, (e.g., 95% percent confidence, alpha: p<0.05, that sort of thing).

The problem is that we never actually collect a large number of samples, but often only one. So we have to estimate the standard error. The formulas for the estimate of the standard error can be simple or complex, but fortunately, there are computer programs to do this for us.

A good reference for this topic is Hays, W.L, (1981). Statistics, Third Edition. New York: Holt, Reinhart & Winston. See Chapter 5, Sampling Distributions and Point Estimation.

#### Question:

A factor loading is the standardized regression coefficient for a factor in the multiple regression equation regressing the variable on the factors. Thus if the factor structure is orthogonal, then the loading is just the correlation between a variable and a factor.

So for an orthogonal set of factors, a negative loading (for a variable on a factor) indicates that scores on the factor tend to be associated with variable scores of the opposite sign.

### When covariates are not helpful

#### Question:

I am doing a repeated measures analysis with covariates. I tested the significance of association between the dependent variables and covariates, and also the homogeneity of regression hyperplanes for the covariates. I believe that I have appropriate covariates to work with, but the problem is that the significance level (both MANOVA Wilks and univariate) is decreased by the covariates. Is this possible?

Including a covariate in a model moves one degree of freedom from the error term to the model term. If the covariate does not increase the model sum of squares enough to compensate, then the F-ratio will decrease, and so will the p-value (significance level).

You may have an example of the case where the model sum of squares is not increased by a covariate because the covariate and the other predictor variables share the variance that predicts the dependent variable.

### Computing explained variance in factor analysis

#### Question:

I ran a factor analysis on five variables and derived an orthogonal two-factor solution. Now I want to see what proportion of the total variance is explained by these two factors. How can I compute this figure?

This proportion of variance explained by each factor is printed by default by most statistical software packages. Note that if the factor extraction method is not Principal Components, this proportion can be negative.

If you need to compute this value yourself, you can do so by summing the eigenvalues of the (two) factors of interest and dividing this number by the sum of all (five) eigenvalues.

For more information about factor analysis using SAS, use the online SAS manual at http://support.sas.com/documentation/onlinedoc/base/index.html. Go to the SAS OnlineDoc under Base SAS 9.1.3 Procedures Guide, Second Edition and click the Index tab. Jump to factor procedure.

### Eigenvalues less than 1.00

#### Question:

I read that the reason an eigenvalue greater than 1.0 is used as a criterion in factor analysis extractions is that if the eigenvalue is less than 1.0, then the variable is explaining less variance than a single item.

My question is this: in the course of a higher-order factor analysis, does the same rationale for using the 1.0 criterion pertain, i.e., if the eigenvalue is less than one, is less variance explained by it than by a single lower-order factor?

The short answer to your question is "Yes". That is, the rationale for only retaining factors with eigenvalues larger than one holds for a higher order factor analysis just as for a lower order one.

One way of thinking about this rule of thumb is to realize that your p variables form a p-dimensional space. You want to rotate the axes of this space so that the new axes maximize the variance of the data points as they are projected onto the axes. The (normalized) eigenvectors of a matrix give the direction cosines determining the rotation, while the eigenvalues give the variance associated with each new axis.

When the matrix is a pxp correlation matrix, the variance of each variable is already standardized to 1, so things are particularly simple. An eigenvalue less than one represents a shrinking of an axis' importance in the new universe.

Similarly, the Spectral Decomposition Theory says that any matrix of rank p can be broken down into the sum of p component matrices. These component matrices are just the outer product of each eigenvector (xx'), weighted by its eigenvalue. Again, a pxp correlation matrix has rank p, and p eigenvalues summing to p. So the factor associated with an eigenvalue of less than 1.0 is not pulling its own weight.

### Number of factors from a factor analysis

#### Question:

How can I decide how many factors I should extract from a factor analysis solution?

There are a number of methods you can use, either individually or in concert to aid you in selecting the number of factors to retain from a factor analysis. Among them are:

1. The eigenvalue greater than or equal to 1.00 rule

Only factors with eigenvalues greater than or equal to 1.00 are retained, since one way to view this situation is that only factors with eigenvalues greater than or equal to 1.00 "pull their own weight" in explaining the common variance shared among your measures.

2. The scree plot

You can request that this be output from SPSS or SAS. You would retain the number of factors up to the "elbow" - 1. For example, consider the following scree plot:

Eigenvalue

|*

|

| *

| * * *

|

|____________________

1 2 3 4 5 Factor Number

Here the "elbow" or bend is at factor 3, but you would retain 3 - 1 factors, or the first 2 factors.

3. Proportion of variance accounted for by factors

Decide a priori on how you wish to define the phrase 'a sufficient proportion of variance is accounted for', and retain only enough factors to cross that threshold.

4. The low error approach

Continue extracting factors until all residual values are 0.10 or lower.

5. Use a chi-square test

SAS and SPSS provide tests of overall goodness-of-fit of the factor analysis model to the data when you choose maximum-likelihood (ML) or generalized least-squares (GLS) factor extraction methods. If you choose to use one of these extraction methods (ML is generally more commonly used than GLS), you also must tell the software package how many factors you expect to be present. It then uses that number of factors as its null hypothesis. That is, the null hypothesis of the chi-square test is that the factor analysis model fits the data. So, a non-significant model test is desirable, whereas a statistically significant chi-square test means that more factors are needed to account for the structure of your data.

You should recognize two important caveats in using the chi-square method to help you decide how many factors to retain. The first caveat is that these test statistics are computed under the assumption of joint multivariate normality. If your data do not meet this assumption, it may not be appropriate to use these chi-square tests. The second caveat is that these tests are very sensitive to sample size. A factor analysis model which otherwise fits the data well may be statistically significant due to a large sample. If you use only the chi-square results to determine the number of factors to retain, you will probably retain too many factors.

You can always use more than one of these methods to help you decide which solution is optimal, but, as always, theory should be your foremost consideration. SAS and SPSS anticipate that theory can guide your decision to extract a given number of factors, so each package provides a method to limit the number of factors extracted to be a specific number (e.g., NFACTORS=2 to extract two factors).

Also, there is the problem that "a person with one watch always knows what time it is; a person with more than one watch never knows the exact time." In other words, using the information from all of these methods may lead to a situation where they conflict, e.g., you retain only factors with eigenvalues greater than 1.00, but you have some residuals with values greater than 0.10.

In this type of situation, theory provides your first guideline, and the other rules of thumb can provide some additional guidance, but it is important not to follow any one of the rules of thumb by rote or too strictly but instead to evaluate the solution as a complete picture, including how it meshes with prior findings, your own theoretical models, etc.

### Testing multivariate skewness and kurtosis

#### Question:

How can I use my sample's skewness and kurtosis to determine whether I have a multivariate normal distribution?

There is a large amount of literature on this topic, although not much is yet implemented in SAS or SPSS.

In Multivariate Analysis, Part 1, Distributions, ordination, and inference, (1994) W.J. Krzanowski and F.H.C. Marriott review (section 3.16, p.58) tests of the null hypothesis that the data come from a multivariate normal distribution.

Mardia 1970 found that if the null hypothesis is true, then a simple function of the sample skewness has an asymptotic chi-squared distribution, and the sample kurtosis has an asymptotic normal distribution. Sample sizes greater than 50 are needed for approximations to be acceptably accurate.

SAS's PROC CALIS will output several measures of univariate and multivariate skewness and kurtosis.

Also, the PRELIS2 program developed by Joreskog and Sorbom will test both univariate and multivariate normality simultaneously, including separate tests of skewness and kurtosis at both the univariate and multivariate level. See page 24 of the PRELIS2 manual (the section titled, "New features in PRELIS2").

Mardia's test of multivariate normality can also be found in EQS, Structural Equation Modeling Software.

AMOS also includes a test of multivariate normality. For detailed instructions on performing this test in AMOS, see the AMOS FAQ on handling non-normal data.

Macros for both SPSS and SAS can be downloaded. Lawrence DeCarlo, Ph.D., provides an SPSS macro for Mardia's test of multivariate skewness and kurtosis at http://www.columbia.edu/~ld208/. A SAS macro is available in the SAS online manual at http://support.sas.com/. To find the macro, go to the Knowledge Base section and click on Samples and SAS Notes. Click on Search Samples, search for multnorm, and choose Macro to test multivariate normality. This site gives a downloadable version of the macro as well as instructions on how to use the macro.

### Sample size for multiple regression

#### Question:

How many participants, cases, or data points do I need per predictor to ensure a stable solution in a multiple regression analysis?

Unfortunately, there is no clear consensus on the exact answer to this question. We have heard and read answers ranging anywhere from 5 to 50 cases per predictor. Generally, the more cases per predictor you have, the better off you will be in terms of your ability to generalize your results to your population of interest. This becomes particularly true when your sample data violate one or more of the assumptions underlying regression analysis.

That said, James Stevens recommends a nominal number of 15 data points per predictor for multiple regression analyses (James Stevens: Applied Multivariate Statistics for the Social Sciences, Third Edition, Lawrence Erlbaum Publishers, p. 72).

### Centering variables prior to computing interaction terms for a multiple regression analysis

#### Question:

I am predicting my dependent variable y from independent variables a and b. How can I calculate the interaction term a*b for use in my regression analysis?

There are differing opinions about how to compute the interaction term for use in an analysis. The steps below will show you how to compute non-centered and centered interaction terms. Some researchers compute the product of a and b (without centering or altering the variables in any way) and enter this product into their regression model, like so:

For SPSS, use the dialog boxes to compute the new interaction variable:

In the Data View window, click Transform and then Compute.
In the Target Variable box, type the name of the new interaction variable, e.g. ab.
In the Numeric Expression box, enter a*b. Click OK.
This computes the interaction term, ab, and adds it to the dataset.
Enter the variables a, b, and ab as independent variables in the regression model.

For SAS:

DATA origdata; SET origdata; ab = a*b ; RUN;
PROC REG DATA = origdata; MODEL y = a b ab ; RUN ;

Other researchers advocate "centering" the a and b predictors before computing the interaction term. Centering the term means subtracting the variable's mean from each case's value on that variable. The result is known as a "deviation score." The SPSS and SAS code shown below can be used to create centered variables.

For SPSS:

In the Data View window, click Transform and then Compute.

Type the variable name, breakvar, in the Target Variable box. Enter a value of 1 in the Numeric

Expression box. Click OK.

This creates a new variable, breakvar, with a value equal to 1. This variable is necessary for calculating the means of variables a and b.

Click Data, then Aggregate.

Click breakvar into the box labeled Break Variables.

Click on a and b to put them into the box labeled Aggregated Variables.

Make sure the function specified in the Summaries of Variables box is the mean of the variable. Make sure the default option of add aggregated variables to active dataset is checked. Click OK.

This will add the mean of a and the mean of b as two new columns in the dataset, a_mean and b_mean, respectively.

Click Transform, then Compute.

Type the centered variable name, acen, in the Target Variable box .

Enter a - a_mean in the Numeric Expression box. Click OK. This creates the centered variable of a.

Create the centered variable, bcen, by entering b - b_mean in the Numeric Expression box. Click OK.

Click Transform, then Compute.

Type the variable name, abcen, in the Target Variable box .

Enter acen* bcen in the Numeric Expression box. Click OK.

This creates the interaction term, abcen, based on the centered variables of a and b.

Use the centered terms acen, bcen, and abcen in the regression model instead of a, b, and ab.

For SAS:

PROC STANDARD DATA = origdata

OUT = centdata

MEAN = 0

PRINT;

VAR a b;

RUN;

DATA centdata;

SET centdata;

ab = a*b; RUN:

PROC REG DATA = centdata;

MODEL y = a b ab; RUN;

The centered and non-centered approaches yield identical overall regression model statistics and tests for the interaction effect (assuming that the interaction effect is the last entered into the regression model, as is generally the case in this type of analysis).

Which approach should you use to compute your interaction term? The chief advantages of centering are that it (1) reduces multicollinearity (a high correlation) between the a and b predictors and the a*b interaction term and (2) can render more meaningful interpretations of the regression coefficients for a and b.

The regression coefficient for a*b will be the same for both approaches, but the coefficients for a and b will differ depending on which method you use. This is because in the non-centering method, the coefficient for a estimates the relationship between a and y where b equals zero. In the centering method, the coefficient for a estimates the relationship between a and y where b equals its average. In many situations, the predictors will not have a meaningful zero point, so a centering approach may be warranted.

Leona Aiken and Stephen West provide an example of this type of situation in their text titled Multiple regression: Testing and interpreting interactions (1991, Sage Publications, Newbury Park, Chapter 3).

As an example, suppose you are predicting athletes' strength levels (y) from height (a) and weight (b) measurements. Under the non-centering approach, the measure of the relationship between height (a) and strength (y) as estimated by the regression coefficient for height (a) occurs where b = 0, or weight equals zero pounds. No athlete we know of has a weight of zero pounds!

Centering provides one remedy to this situation: In the centered model, the regression coefficient for height (a) estimates the relationship between height (a) and strength (y) where weight (b) is equal to the mean weight in the data set instead of zero.

Aiken and West devote an entire chapter of their book to the topic of centering (chapter 3). This book is available from the Physics-Math-Astronomy library on campus. See http://catalog.lib.utexas.edu/.

### Simple main effects tests

#### Question:

How can I carry out a simple main effects test using either SAS or SPSS?

You can use either SAS or SPSS to conduct these tests. For example, let's say that you had a fairly straightforward completely between-subjects design, with a dependent variable called Y and two categorical predictors, A and B, each with two levels. Suppose that you wanted to test the effect of A for each level of B.

In SAS release 6.11 or higher, you can use the SLICE option in PROC GLM, like this:

PROC GLM ;

CLASS a b ;

MODEL y = a b a*b ;

LSMEANS a*b / SLICE=b;

If you are using SPSS, you can use the MANOVA command to test the same hypotheses as the SAS program shown above:

MANOVA y BY a(1,2) b(1,2) /design = B A within b(1) A within b(2).

More complex designs are testable with the recent SPSS GLM and SAS MIXED procedures, including within-subjects simple main effects tests.

### Multilevel models

#### Question:

I'm analyzing a dataset with dyads (couples) and another dataset with families who have different numbers of children per family. Someone suggested analyzing my data using something called a multilevel model. What's a multilevel model and why should I use it?

The data you describe are often referred to as "hierarchical" or "clustered" because subjects (individuals) are nested within clusters or units such as families or couples. Many commonly-used statistical procedures such as ordinary least-squares linear regression assume that every observation is independent of every other observation in the dataset. Obviously, when clusters are present, this assumption is violated.

To address this problem, researchers developed special statistical models to take into account the hierarchical nature of such datasets. As a class, these models are known as multilevel models. Other investigators developed special software programs designed specifically for the analysis of multilevel models.

Among the general purpose software packages that we support, SAS is one of the most commonly used in handling multilevel models. The MIXED procedure can be used to analyze data with continuous distributions whereas the GENMOD procedure can be used for repeated measures with non-normally distributed variables. An additional feature of MIXED that GENMOD lacks is the ability to estimate variances for the cluster level; this feature is useful for descriptive purposes as well as the computation of proportions of variance due to clusters versus individuals in the dataset.

To learn more about using PROC MIXED to fit multilevel models to normally distributed outcome variables, you can download a copy of the paper "Using SAS PROC MIXED to fit multilevel models, hierarchical models, and individual growth models" written by Judith Singer at Harvard University at http://gseweb.harvard.edu/~faculty/singer/. If your outcome variables are non-normally distributed, consider using the GLIMMIX macro available from SAS Institute. GLIMMIX uses PROC MIXED as part of its syntax, so you can use GLIMMIX to obtain variance component estimates for clusters.

Other software packages that provide multilevel modeling analysis are SPSS, LISREL, HLM, MPlus, MLwiN, and AMOS. Details on these packages can be found under the FAQ section of the statistical website of David Garson, Ph.D.:

http://www2.chass.ncsu.edu/garson/pa765/multilevel.htm.

Links for the software are given below:

Multilevel model analysis is complex and a rapidly growing and changing field. To keep up to date, consider joining a Multilevel Internet E-mail list. Also, you can visit the HLM website at http://www.ssicentral.com/ or the Centre for Multilevel Modelling home pages at http://www.cmm.bristol.ac.uk/ to get the latest information about multilevel model workshops, software, and related resources.

### Contrast coding

#### Question:

I've run a balanced-data ANOVA with one between-subjects factor (GROUP) and one within-subjects factor (TIME). Group has three levels; time has three levels. My dependent variable is anxiety, measured at three equally-spaced intervals.

I now want to run a contrast analysis. I want to compare group 1 to group 2 across all three measurement occasions of anxiety. How can I determine what my contrast weights should be?

One widely used method is first to specify your hypothesis in terms of your design's cell means. You then re-express the hypothesis in terms of the model parameters used by your software. Finally, you match the weights found in this expression to the syntax required by your software.

To identify the population cell means in your GROUP by TIME study, it's helpful to use a table, like this:

 Population Means Time 1 Time 2 Time 3 Group 1 Mu11 Mu12 Mu13 Group 2 Mu21 Mu22 Mu23 Group 3 Mu31 Mu32 Mu33

Once you identify the cell means, the next step is to specify your null hypothesis as an equality among various combinations of these means.

Your hypothesis, stated in null hypothesis form, reads: "The population mean for group 1 equals the population mean for group 2, when this mean is taken across all measurement occasions of anxiety." Translating this natural language hypothesis into a statement about equality of population means, you get:

Mu11+Mu12+Mu13 = Mu21+Mu22+Mu23

2. Re-express the hypothesis in terms of the model parameters.

For the standard balanced-data two-way ANOVA model, the relationship between a cell mean and the model parameters is widely known. In our case, this relationship for Mu11 is:

Mu11 = I+G1+T1+GT11

That is, each individual population mean is composed of an intercept term (abbreviated I), a main effect term due to group (abbreviated G), another main effect term due to Time (abbreviated T), and a group by time interaction term (abbreviated GT).

When we substitute these expressions into the null hypothesis formula shown above, we get:

(I+G1+T1+GT11)+(I+G1+T2+GT12)+(I+G1+T3+GT13)=(I+G2+T1+GT21)+(I+G2+T2+GT22)+(I+G2+T3+GT23)

While this formula may seem intimidating, it is easily simplified. There are three intercept terms (I) before the equals sign and three intercept terms after it, so all intercept terms drop out of the formula. The T1, T2, and T3 terms also drop out. This leaves us with:

(G1+GT11)+(G1+GT12)+(G1+GT13)=(G2+GT21)+(G2+GT22)+(G2+GT23)

We now continue simplifying by collecting terms. We have three G1 terms and three G2 terms, giving us:

3G1 + GT11 + GT12 + GT13 = 3G2 + GT21 + GT22 + GT23

Notice that what we have left is an equality between each group's main effect and group by time interaction terms. We're collapsing across our time variable, which agrees with our hypothesis. However, you may be surprised by the interaction terms, since our hypothesis doesn't explicitly mention them. We'll say more about this later.

3. Translate this expression into the form required by the software's syntax.

Most software requires the expression to equate to a constant (usually zero), so we subtract one side from each side to get:

3G1 + GT11 + GT12 + GT13 - 3G2 - GT21 - GT22 - GT23 = 0

Then we need to arrange our terms in the order used by our software. For SAS or SPSS, the order in this case would be:

3G1 - 3G2 + GT11 + GT12 + GT13 - GT21 - GT22 - GT23 = 0

Finally, we need to include a term for every parameter in a variable, unless each parameter in the variable has a weight of zero. Our equation becomes:

3G1 - 3G2 + 0G3 + GT11 + GT12 + GT13 - GT21 - GT22 - GT23 + 0GT31 + 0GT32 + 0GT33 = 0

We can now read off the contrast weights, which are just the coefficients of the effect terms.

Although the exact specification of contrast statement varies from package to package, its general form is as follows:

"contrast-name" variable-name weights

where "contrast-name" is a quoted string that identifies the contrast on the software's output, "variable-name" is the name of the variable (e.g., GROUP), and "weights" are the contrast weights you've generated.

Let's put our contrast weights into this framework:

"my contrast" group 3 -3 0 group*time 1 1 1 -1 -1 -1 0 0 0

Contrast coding can be a challenging exercise. It is easy to produce contrast weights which do not test your hypothesis unless you follow a systematic method such as the one described here. Be sure to check carefully the contrast results produced by your software. These results should be consistent with the usual descriptive information (e.g., cell means, standard deviations, and standard errors) you should run before you perform a contrast analysis. If you are uncertain about the validity of your contrast results, contact a consultant at for assistance (click HERE for more info about consulting services).

For instance, you might have expected the interaction terms to have dropped out. This would be appropriate in a model employing the usual side conditions that the interaction terms within a level sum to zero. However, both SAS and SPSS use the "overparameterized" ANOVA model, which does not assume such restrictions.

For more information on generating and specifying contrast codes, see the online SAS manual at http://support.sas.com/documentation/. Under SAS Product Documentation, click on SAS/STAT. Click on SAS OnlineDoc under SAS/STAT 9.1.3; scroll down and click on SAS/STAT and then click on SAS/STAT User's Guide. Scroll down to The GLM Procedure; the Syntax section discusses the CONTRAST command.

### Handling missing or incomplete data

#### Question:

I have a database that contains records with incomplete data; some research participants did not complete all of the available questions on my survey. How should I handle this problem?

Missing or incomplete data are a serious problem in many fields of research. An added complication is that the more data that are missing in a database, the more likely it is that you will need to address the problem of incomplete cases, yet those are precisely the situations where imputing or filling in values for the missing data points is most questionable due to the small proportion of valid data points relative to the size of the data matrix. This FAQ highlights commonly-used methods of handling incomplete data problems. It discusses a number of their known strengths and weaknesses. At the end of the FAQ a software table is provided that compares and contrasts some commonly-used software options for handling missing data and details their availability to UT faculty, students, and staff.

When you choose a missing data handling approach, keep in mind that one of the desired outcomes is maintaining (or approximating as closely as possible) the shape of the original distribution of responses. Some incomplete data handling methods do a better job of maintaining the distributional shape than others. For instance, one popular method of imputation, mean substitution, can result in a distribution with truncated variance.

If you have questions about the advisability of applying a particular method to your own database, we recommend you schedule an appointment with a Statistical Services consultant to discuss these issues as they pertain to your own unique circumstances (note: This service is available to University of Texas faculty, staff, and students only). Missing data imputation and handling is a rapidly evolving field with many methods, each applicable in some circumstances but not others.

Types of missing data

The most appropriate way to handle missing or incomplete data will depend upon how data points became missing. Little and Rubin (1987) define three unique types of missing data mechanisms.

Missing Completely at Random (MCAR):

Cases with complete data are indistinguishable from cases with incomplete data. Heitjan (1997) provides an example of MCAR missing data: Imagine a research associate shuffling raw data sheets and arbitrarily discarding some of the sheets. Another example of MCAR missing data arises when investigators randomly assign research participants to complete two-thirds of a survey instrument. Graham, Hofer, MacKinnon (1996) illustrate the use of planned missing data patterns of this type to gather responses to more survey items from fewer research participants than one ordinarily obtains from the standard survey completion paradigm in which every research participant receives and answers each survey question.

Missing at Random (MAR): Cases with incomplete data differ from cases with complete data, but the pattern of data missingness is traceable or predictable from other variables in the database rather than being due to the specific variable on which the data are missing. For example, if research participants with low self-esteem are less likely to return for follow-up sessions in a study that examines anxiety level over time as a function of self-esteem, and the researcher measures self-esteem at the initial session, self-esteem can then be used to predict the missingness pattern of the incomplete data. Another example is reading comprehension: Investigators can administer a reading comprehension test at the beginning of a survey administration session; research participants with lower reading comprehension scores may be less likely to complete the entire survey. In both of these examples, the actual variables where data are missing are not the cause of the incomplete data. Instead, the cause of the missing data is due to some other external influence.

Nonignorable: The pattern of data missingness is non-random and it is not predictable from other variables in the database. If a participant in a weight-loss study does not attend a weigh-in due to concerns about his weight loss, his data are missing due to nonignorable factors. In contrast to the MAR situation outlined above where data missingness is explainable by other measured variables in a study, nonignorable missing data arise due to the data missingness pattern being explainable --- and only explainable --- by the very variable(s) on which the data are missing.

In practice it is usually difficult to meet the MCAR assumption. MAR is an assumption that is more often, but not always, tenable. The more relevant and related predictors one can include in statistical models, the more likely it is that the MAR assumption will be met.

Methods of handling missing data

Some of the more popular methods for handling missing data appear below. This list is not exhaustive, but it covers some of the more widely recognized approaches to handling databases with incomplete cases.

Listwise or casewise data deletion: If a record has missing data for any one variable used in a particular analysis, omit that entire record from the analysis. This approach is implemented as the default method of handling incomplete data by many statistical procedures in commonly-used statistical software packages such as SAS and SPSS.

Pairwise data deletion: For bivariate correlations or covariances, compute statistics based upon the available pairwise data. Pairwise data deletion is available in a number of SAS and SPSS statistical procedures.

Mean substitution: Substitute a variable’s mean value computed from available cases to fill in missing data values on the remaining cases. This option appears in several SPSS procedures. The Base module of SPSS also allows easy computation of new variables that contain mean substitution data values. In the Data Editor spreadsheet: Select Transform, then Replace Missing Values (Note: This function is not the same as that offered by the SPSS Missing Values Analysis add-in module; the MVA module uses the EM approach described below). SAS allows mean substitution using the STANDARD procedure; see the SAS FAQs for details.

Regression methods: Develop a regression equation based on complete case data for a given variable, treating it as the outcome and using all other relevant variables as predictors. Then, for cases where Y is missing, plug the available data into the regression equation as predictors and substitute the equation’s predicted Y value into the database for use in other analyses. An improvement to this method involves adding uncertainty to the imputation of Y so that the mean response value is not always imputed.

Hot deck imputation: Identify the most similar case to the case with a missing value and substitute the most similar case’s Y value for the missing case’s Y value.

Expectation Maximization (EM) approach: An iterative procedure that proceeds in two discrete steps. First, in the expectation (E) step you compute the expected value of the complete data log likelihood. In the maximization (M) step you substitute the expected values for the missing data obtained from the E step and then maximize the likelihood fuction as if no data were missing to obtain new parameter estimates. The procedure iterates through these two steps until convergence is obtained. The SPSS Missing Values Analysis (MVA) module employs the EM approach to missing data handling.

Raw maximum likelihood methods: Use all available data to generate maximum likelihood-based sufficient statistics. Usually these consist of a covariance matrix of the variables and a vector of means. This technique is also known as Full Information Maximum Likelihood (FIML).

Multiple imputation: Similar to the maximum likelihood method, except that multiple imputation generates actual raw data values suitable for filling in gaps in an existing database. Typically, five to ten databases are created in this fashion. The investigator then analyzes these data matrices using an appropriate statistical analysis method, treating these databases as if they were based on complete case data. The results from these analyses are then combined into a single summary finding.

Roth (1994) reviews these methods and concludes, as did Little & Rubin (1987) and Wothke (1998), that listwise, pairwise, and mean substitution missing data handling methods are inferior when compared with maximum likelihood based methods such as raw maximum likelihood or multiple imputation. Regression methods are somewhat better, but not as good as hot deck imputation or maximum likelihood approaches. The EM method falls somewhere in between: It is generally superior to listwise, pairwise, and mean substitution approaches, but it lacks the uncertainty component contained in the raw maximum likelihood and multiple imputation methods.

It is important to understand that these missing data handling methods and the discussion that follows deal with incomplete data primarily from the perspective of estimation of parameters and computation of test statistics rather than prediction of values for specific cases. Warren Sarle at SAS Institute has put together a helpful paper on the topic of missing data in the contexts of prediction and data mining. The paper can be found online in postscript form at ftp://ftp.sas.com/pub/neural/JCIS98.ps and in an html version.

Hot deck and maximum likelihood-based approaches to handling missing data

Hot deck

Hot deck imputation fills in missing cells in a data matrix with the next most similar case's values. Consider the following example database.

Illustration of Hot Deck Imputation: Data Matrix with Incomplete Data

 Case Item 1 Item 2 Item 3 Item 4 1 4 1 2 3 2 5 4 2 5 3 3 4 2

Case three has a missing data cell for item four. Hot deck imputation examines the cases with complete records (cases one and two in this example) and substitutes the value of the most similar case for the missing data point. In this example there are two complete cases to choose from: cases one and two. Case two is more similar to case three, the case with the missing data point, than in case one. Case two and case three have the same values for items two and three whereas case one and case three have the same value for item three only. Therefore, case two is more similar to case three than is case one. Note: There are different strategies for how to judge similarity.

Once the hot deck imputation determines which case among the observations with complete data is the most similar to the record with incomplete data, it substitutes the most similar complete case's value for the missing variable into the data matrix.

Illustration of Hot Deck Imputation: Data Matrix with Imputed Data

 Case Item 1 Item 2 Item 3 Item 4 1 4 1 2 3 2 5 4 2 5 3 3 4 2 5

Since case two had the value of five for item four, the hot deck procedure imputes a value of five for case three to replace the missing data cell. Data analysis may then proceed using the new complete database.

Hot deck imputation has a long history of use, including years of use by the United States Census Bureau. It can be superior to listwise deletion, pairwise deletion, and mean substitution approaches to handling missing data. Among hot deck's advantages are its conceptual simplicity, its maintenance of the proper measurement level of variables (categorical variables remain categorical and continuous variables remain continuous), and the availability of a complete data matrix at the end of the imputation process that can be analyzed like any complete data matrix. One of hot deck's disadvantages is the difficulty in defining "similarity"; there may be any number of ways to define what similarity is in this context. Thus, the hot deck procedure is not an "out of the box" approach to handling incomplete data. Instead it requires that you develop custom software syntax to perform the selection of donor cases and the subsequent imputation of missing values in your database. More sophisticated hot deck algorithms would identify more than one similar record and then randomly select one of those available donor records to impute the missing value or use an average value if that were appropriate.

Two examples of SAS macros used to perform hot deck imputation can be found online. John Stiller and Donald R. Dalzell (1998) wrote a paper titled "Hot-deck Imputation with SAS® Arrays and Macros for Large Surveys" which can be found at http://www2.sas.com/proceedings/sugi23/Stats/p246.pdf. Lawrence Altmayer from the U.S. Bureau of the Census wrote a paper "Hot-Deck Imputation: A Simple DATA Step Approach" which can be found at http://www8.sas.com/scholars/05/PREVIOUS/1999/pdf/075.pdf.

Expectation maximization (EM)

The expectation maximization (EM) approach to missing data handling is documented extensively in Little & Rubin (1987). The EM approach is an iterative procedure that proceeds in two discrete steps. First, in the expectation (E) step the procedure computes the expected value of the complete data log likelihood based upon the complete data cases and the algorithm's "best guess" as to what the sufficient statistical functions are for the missing data based upon the model specified and the existing data points; actual imputed values for the missing data points need not be generated. In the maximization (M) step it substitutes the expected values (typically means and covariances) for the missing data obtained from the E step and then maximizes the likelihood function as if no data were missing to obtain new parameter estimates. The new parameter estimates are substituted back into the E step and a new M step is performed. The procedure iterates through these two steps until convergence is obtained. Convergence occurs when the change of the parameter estimates from iteration to iteration becomes negligible.

The SPSS Missing Values Analysis (MVA) module employs the EM approach to missing data handling. The strength of the approach is that it has well-known statistical properties and it generally outperforms popular ad hoc methods of incomplete data handling such as listwise and pairwise data deletion and mean substitution because it assumes incomplete cases have data missing at random (MAR) rather than missing completely at random (MCAR). The primary disadvantage of the EM approach is that it adds no uncertainty component to the estimated data. Practically speaking, this means that while parameter estimates based upon the EM approach are reliable, standard errors and associated test statistics (e.g., t-tests) are not. This shortcoming led statisticians to develop two newer likelihood-based methods for handling missing data, the raw maximum likelihood approach and multiple imputation.

Raw maximum likelihood

Raw maximum likelihood, also known as Full Information Maximum Likelihood (FIML), methods use all available data points in a database to construct the best possible first and second order moment estimates under the MAR assumption. Put less technically, if the missing at random (MAR) assumption can be met, maximum likelihood-based methods can generate a vector of means and a covariance matrix among the variables in a database that is superior to the vector of means and covariance matrix produced by commonly-used missing data handling methods such as listwise deletion, pairwise deletion, and mean substitution. See Wothke (1998) for a convincing demonstration.

Under an unrestricted mean and covariance structure, raw maximum likelihood and EM return identical parameter estimate values. Unlike EM, however, raw maximum likelihood can be employed in the context of fitting user-specified linear models, such as structural equation models, regression models, ANOVA and ANCOVA models, etc. Raw maximum likelihood also produces standard errors and parameter estimates under the assumption that the fitted model is not false, so parameter estimates and standard errors are model-dependent. That is, their values will depend upon the model chosen and fitted by the investigator.

Raw maximum likelihood missing data handling is currently implemented in the AMOS structural equation modeling package currently supported by ITS. The primary advantage of this method from a practical standpoint is that it is built in to the software package: the AMOS user simply clicks on a check box to enable missing data handling. The program then fits the analyst's model using the raw maximum likelihood missing data handling approach. Any general linear model including ANOVA, ANCOVA, MANOVA, MANCOVA, path analysis, confirmatory factor analysis, and numerous time series and longitudinal models can be fit using AMOS.

Other software packages that use the raw maximum likelihood approach to handle incomplete data are the MIXED procedure in SAS and SPSS (see the paper titled "Linear mixed-effects modeling in SPSS”) and Michael Neale's MX. The MIXED procedure can fit ANOVA, ANCOVA, and repeated measures models with time-constant and time-varying covariates. You should strongly consider using a MIXED procedure instead of SAS PROC GLM or the SPSS General Linear Models (GLM) procedures whenever you have repeated measures data with missing data points. The MIXED procedures can also fit hierarchical linear models (HLMs), also known as multilevel or random coefficient models. MX is a freeware structural equation modeling program.

Raw maximum likelihood has the advantage of convenience/ease of use and well-known statistical properties. Unlike EM, it also allows for the direct computation of appropriate standard errors and test statistics. Disadvantages include an assumption of joint multivariate normality of the variables used in the analysis and the lack of a raw data matrix produced by the analysis. Recall that the raw maximum likelihood method only produces a covariance matrix and a vector of means for the variables; the statistical software then uses these as imputes for further analyses.

Raw maximum likelihood methods are also model-based. That is, they are implemented as part of a fitted statistical model. The investigator may want to include relevant variables (e.g., reading comprehension) that will improve the accuracy of parameter estimates, but not include these variables in the statistical model as predictors or outcomes. While it is possible to do this, it is not always easy or convenient, particularly in large or complex models.

Finally, raw maximum likelihood assumes the incomplete data cells are missing at random. Wothke (1998) suggests, however, that raw maximum likelihood can offer superior performance to listwise and pairwise deletion methods even in the nonignorable data situation.

Multiple imputation

Multiple imputation combines the well-known statistical advantages of EM and raw maximum likelihood with the ability of hot deck imputation to provide a raw data matrix to analyze. Multiple imputation works by generating a maximum likelihood-based covariance matrix and vector of means, like EM. Multiple imputation takes the process one step further by introducing statistical uncertainty into the model and using that uncertainty to emulate the natural variability among cases one encounters in a complete database. Multiple imputation then imputes actual data values to fill in the incomplete data points in the data matrix, just as hot deck imputation does.

The primary difference between multiple imputation and hot deck imputation from a practical or procedural standpoint is that multiple imputation requires that the data analyst generate five to ten databases with imputed values. The data analyst then analyzes each database, collects the results from the analyses, and summarizes them into one summary set of findings. For instance, suppose a researcher wishes to perform a multiple regression analysis on a database with incomplete data. The researcher would run multiple imputation, generate ten imputed databases, and run the multiple regression analysis on each of the ten databases. The researcher then combines the results from the ten regression analyses together into one summary for presentation, not necessarily a trivial task.

Multiple imputation has several advantages: It is fairly well-understood and robust to violations of non-normality of the variables used in the analysis. Like hot deck imputation, it outputs complete raw data matrices. It is clearly superior to listwise, pairwise, and mean substitution methods of handling missing data in most cases. Disadvantages include the time intensiveness in imputing five to ten databases, testing models for each database separately, and recombining the model results into one summary. Furthermore, summary methods have been worked out for linear and logistic regression models, but work is still in progress to provide statistically appropriate summarization methods for other models such as factor analysis, structural equation models, multinomial logit regression models, etc.

Schafer (1997) thoroughly documents multiple imputation theory in a textbook. Schafer has also written the freeware PC program NORM to perform multiple imputation analysis. Another freeware program similar to NORM called Amelia may also be downloaded.

SAS users can use the procedures MI and MIANALYZE to perform multiple imputation and combine analyses from the imputed data sets. PROC MI computes the imputed data sets; the data analyst then uses a standard SAS procedure, such as REG, GLM, or MIXED to analyze each imputed data set; finally, MIANALYZE combines the output from the analyses on the imputed data sets and provides the overall results. These procedures are available in SAS version 9.

Pattern-mixture models for non-ignorable missing data

All the methods of missing data handling considered above require that the data meet the Little & Rubin (1987) missing at random (MAR) assumption. There are circumstances, however, when this assumption cannot be met to a satisfactory degree; cases are considered missing due to non-ignorable causes (Heitjan, 1997). In such instances the investigator may want to consider the use of a pattern-mixture model, a term used by Hedeker & Gibbons (1997). Earlier works dealing with pattern-mixture models include Little & Schenker (1995), Little (1993), and Glynn, Laird, & Rubin (1986).

Pattern-mixture models categorize the different patterns of missing values in a dataset into a predictor variable, and this predictor variable is incorporated into the statistical model of interest. The investigator can then determine if the missing data pattern has any predictive power in the model, either by itself (a main effect) or in conjunction with another predictor (an interaction effect).

The chief advantage of the pattern-mixture model is that it does not assume the incomplete data are missing at random (MAR) or missing completely at random (MCAR). The primary disadvantage of the pattern-mixture model approach is that it requires some custom programming on the part of the data analyst to obtain one part of the pattern-mixture analysis, the pattern-mixture averaged results. It is worth noting, however, that Hedeker & Gibbons (1997, Appendix) demonstrate that some results may be obtained by using the SAS MIXED procedure and they provide sample SAS/IML code to obtain pattern-mixture averaged results on their Web site. If the number of missing data patterns and the number of variables with missing data are large relative to the number of cases in the analysis, the model may not converge due to insufficient data to support the use of many main effect and interaction terms.

Conclusions

Although applied researchers cannot turn to a single "one size fits all" solution for handling incomplete data problems, several trends in the missing data analysis literature are worth noting. First, ad hoc and commonly-used methods of handling incomplete data such as listwise and pairwise deletion and mean substition are inferior to hot deck imputation, raw maximum likelihood, and multiple imputation methods in most situations. Second, software to perform hot deck, raw maximum likelihood, and multiple imputation is becoming more widely available and easier to use.

Although all of the methods described so far assume the incomplete data are missing at random, new statistical models are being developed to handle data missing due to nonignorable factors. Some of these models can be partially fit using familiar statistical packages and procedures such as the MIXED procedure in either SAS (e.g., Hedeker & Gibbons, 1997) or SPSS (see the paper titled "Linear mixed-effects modeling in SPSS”).

References

Glynn, R., Laird, N.M., & Rubin, D.B. (1986). Selection modeling versus mixture modeling with nonignorable nonresponse. In H. Wainer (ed.) Drawing Inferences from Self-Selected Samples, 119-146. New York: Springer-Verlag.

Graham, J.W., Hofer, S.M., & MacKinnon, D.P. (1996). Maximizing the usefulness of data obtained with planned missing value patterns: An application of maximum likelihood procedures. Multivariate Behavioral Research, 31(2), 197-218.

Hedeker, D. & Gibbons, R.D. (1997). Application of random-effects pattern-mixture models for missing data in longitudinal studies. Psychological Methods, 2(1), 64-78.

Heitjan, D.F. (1997). Annotation: What can be done about missing data? Approaches to imputation. American Journal of Public Health, 87(4), 548-550.

Iannacchione, V. G. (1982). Weighted sequential hot deck imputation macros. Proceedings of the SAS Users Group International Conference, 7, 759- 763.

Little, R.J.A. (1993). Pattern-mixture models for multivariate incomplete data. Journal of the American Statistical Association, 88, 125-124.

Little, R.J.A., & Schenker, N. (1995). Missing Data. In Arminger, Clogg, & Sobel (eds.) Handbook of Statistical Modeling for the Social and Behavioral Sciences. New York: Plenum.

Little, R.J.A. & Rubin, D.A. (1987). Statistical analysis with missing data. New York: John Wiley and Sons.

Roth, P. (1994). Missing data: A conceptual review for applied psychologists. Personnel Psychology, 47, 537-560.

SAS Institute Inc. 2004. SAS/STAT 9.1 User's Guide. Cary, NC: SAS Institute Inc.

Schafer, J.L. (1997) Analysis of Incomplete Multivariate Data. Book number 72 in the Chapman & Hall series Monographs on Statistics and Applied Probability. London: Chapman & Hall.

Wothke, W (1998). Longitudinal and multi-group modeling with missing data. In T.D. Little, K.U. Schnabel, & J. Baumert (Eds.) Modeling longitudinal and multiple group data: Practical issues, applied approaches and specific examples. Mahwah, NJ: Lawrence Erlbaum Associates.

Software Table

The table below specifies several commonly-used software options for handling missing or incomplete data. The table is not intended to be an exhaustive list of every possible missing data-handling software package. However, if you discover or know of another software option you have used successfully, please let us know by sending an e-mail to stat.admin@austin.utexas.edu

The table lists the name of the software, the method of handling incomplete data, assumptions it makes about the causes of missing data, whether the package is supported at UT Austin, pricing and availability to UT faculty, students, and staff, and miscellaneous comments generally dealing with the perceived ease of use of the package from the perspective of computing novices. Note that in addition to the assumptions about the origins of incomplete data, many of the methods shown below also contain other tacit assumptions (e.g., joint multivariate normality of variables included in the analysis).

Missing Data Handling Software Options

### How to perform pairwise comparisons of sample correlation coefficients

#### Question:

I have two correlation coefficients computed from two different samples. Is there a test that I can perform that will allow me to determine whether the two coefficients are significantly different?

Yes there is. In order to perform such a test, one must first transform both of the sample correlation coefficients using the Fisher r-to-Z transformation, given by the rule

Z = ½ln(1+ rxy/1- rxy)

where rxy = the sample correlation coefficient for variables X and Y. Also, ln = log base e (i.e., the natural logarithm). Hence, the Fisher r-to-Z transformation involves a logarithmic transformation of the sample correlation coefficients.

The test statistic is then

(Z1 – Z2) / var(Z1 – Z2)

where Z1 equals the transformed value of the first sample, Z2 the transformed value of the second, and

var(Z1 – Z2) = sqrt(1/(N1 – 3) + 1/(N2 – 3))

According to Hays' Statistics, (1988, p. 591), "For reasonably large samples (say, 10 in each), this ratio can be referred to the standard normal distribution." If the absolute value of the statistic exceeds 1.96, then one can reject (at the .05 level, two tailed) the null hypothesis that the two correlation coefficients come from populations with the same "true" level of correlation among X and Y. However, in order for this test to be valid, the samples must be independent and the population represented by each must be approximately normal.

The condition of independence would certainly not hold if the two samples involved the same subject (e.g. repeated measures) or matched subjects. For data of this kind, consult General FAQ "How to compare correlation coefficients from the same sample".

SAS 9.1 can compute correlation coefficients using Fisher's r-to-Z transformation and provide confidence intervals and p-values for these coefficients. Details are provided in the SAS online manual at http://support.sas.com/onlinedoc/913/; choose the Index tab and jump to CORR .

For older versions of SAS, the SAS macro, compcorr, will compute the Fisher's r-to-Z transformation, giving the test statistic, p-value, and confidence intervals; the macro is located in http://support.sas.com/kb/24/995.html.

SPSS users can use the following syntax.

* Fisher r to Z testing program.

* Compares correlations from two independent samples.

* See Hays (1988), p. 591.

** Begin sample program.

* Enter correlations into an SPSS database.

DATA LIST free

/corr1 corr2.

BEGIN DATA.

.50 .35 END DATA.

* Define the sample sizes of each group.

COMPUTE n1 = 25.

COMPUTE n2 = 25.

* Convert r values to Z values.

COMPUTE z1 = .5*LN((1+corr1)/(1-corr1)).

COMPUTE z2 = .5*LN((1+corr2)/(1-corr2)).

* Compute the estimated standard error.

COMPUTE stderr = sqrt((1/(n1-3))+(1/(n2-3))).

* Compute the final Z value. Evaluate this value.

* against a standard normal distribution for.

* statistical significance.

COMPUTE ztest = (z1-z2)/stderr.

COMPUTE p_1_tail = 1- CDF.NORMAL(abs(ztest),0,1).

COMPUTE p_2_tail = (1 - CDF.NORMAL(abs(ztest),0,1))*2.

* Print the results.
LIST.

** End sample program.

In the example shown above, the SPSS user inputs two sample correlation coefficients, .50 and .35. He then inputs the sizes of the samples - each has 25 cases. The program then computes the appropriate Z test for the equality of the two correlations. It outputs a one-tailed test of the correlations' equality (represented by the p_1_tail variable) as well as a two-tailed test of the same equality (represented by the p_2_tail variable).

### What is a good kappa coefficient?

#### Question:

I have computed Cohen's kappa to assess agreement among raters, corrected for chance agreement. What is a reasonable kappa level? What are good and poor values of kappa?

The information that follows was derived from posts by Judith Saebel and Scott McNary on the Structural Equation Modeling LISTSERV e-mail discussion group on August 19, 1999.

Although there are no absolute cutoffs for kappa coefficients, two sources provide some rough guidelines for the interpretation of kappa coefficients. According to J. L. Fleiss (1981), p. 218, values exceeding .75 suggest strong agreement above chance, values in the range of .40 to .75 indicate fair levels of agreement above chance, and values below .40 are indicative of poor agreement above chance levels.

A journal article by Landis & Koch, p. 159, suggests the following kappa interpretation scale may be useful:

 Kappa Value Interpretation Below 0.00 Poor 0.00-0.20 Slight 0.21-0.40 Fair 0.41-0.60 Moderate 0.61-0.80 Substantial 0.81-1.00 Almost perfect

In addition, Gardner (1995) recommends that kappa exceed .70 before you proceed with additional data analyses.

References

Fleiss, J. L. (1981). Statistical Methods for Rates and Proportions. New York: John Wiley & Sons. (Second Edition).

Gardner, W. (1995). On the reliability of sequential data: measurement, meaning, and correction. In John M. Gottman (Ed.),

The analysis of change. Mahwah, N.J.: Erlbaum.

Landis, J. & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, (33), 159-174.

If you have further questions, send E-mail to stat.admin@austin.utexas.edu This e-mail address is being protected from spambots. You need JavaScript enabled to view it .

### How to compare sample correlation coefficients drawn from the same sample

#### Question:

I would like to compare two sample correlation coefficients, but they are drawn from the same sample. Is there a method of testing for a significant difference between them that takes their dependence into account?

Note: If you have two correlation coefficients computed from two different samples, please consult General FAQ "How to perform pairwise comparisons of sample correlation coefficients".

Yes there is, and it involves a choice between two different methods. The first method relies on a formula found in Cohen & Cohen (1983) on page 57. The formula yields a t-statistic with n - 3 degrees of freedom. As written below, the formula tests for a significant difference in the correlation between variables X & Y and V & Y:

t = (rxy - rvy)*sqrt((n-1)(1 + rxv))/(sqrt(2((n-1)/(n-3))|R| + ((rxy + rvy)/2)^2(1-rxv)^3))

where

rxy = correlation coefficient between variables x and y

rxv = correlation coefficient between variables x and v

ryv = correlation coefficient between variables y and v

and |R| = (1 - rxy ^2 - rvy^2 - rxv^2 + (2*rxy*rxv*rvy)), the determinant of the correlation matrix for X, Y, and V.

Unfortunately, the above method is not available as an option in any of the statistical procedures in either SPSS or SAS. However, SPSS users can adapt the following syntax to perform the test:

* Dependent Correlation Comparison Program.

* Compares correlation coefficients from the same sample.

* See Cohen & Cohen (1983), p. 57.

DATA LIST free

/rxy rvy rxv.

BEGIN DATA.

.50 .32 .65

END DATA.

* Define the sample size.

COMPUTE n =50.

COMPUTE diffr = rxy - rvy.

COMPUTE detR = (1 - rxy **2 - rvy**2 - rxv**2)+ (2*rxy*rxv*rvy).

*Calculate (rxy + rvy)^2 .

COMPUTE rbar = (rxy + rvy)/2.

* Calculate numerator of t statistic.

COMPUTE tnum = (diffr)

* (sqrt((n-1)*(1 + rxv))).

COMPUTE tden = sqrt(2*((n-1)/(n-3))*detR + ((rbar**2) * ((1-rxv)**3))).

COMPUTE t= (tnum/tden).

COMPUTE df = n - 3.

* Evaluate the value of the t statistic.

* against a t distribution with n - 3 degrees if freedom for.

* statistical significance.

COMPUTE p_1_tail = 1 - CDF.T(abs(t),df).

COMPUTE p_2_tail = (1 - CDF.T(abs(t),df))*2.

EXECUTE.

The above syntax will generate an active dataset that will appear in the data editor window. In the Variable View, change the number of decimals to the desired setting to display appropriate results.

Notice that the method above is limited to the three variable case (e.g, X & Y, and V & Y). The second method represents a more flexible approach to the problem. With this method, one would use a statistical software package capable of estimating covariance structural models (e.g. SAS, AMOS, LISREL) to compare an observed correlation matrix to an estimated correlation matrix that includes restrictions that represent a null hypothesis. Steiger (1980) discusses this approach in more detail.

For example, suppose you have a set of four variables: X, Y, Z, and Q, and you want to test whether the correlation between X and Y is the same as that between Z and Q. The observed correlation matrix would be symmetric and look like the one presented below,

 X Y Z Q X a b c c Y b e f g Z c f h i Q d g i j

where capital letters represent variables and lower case letters represent correlation coefficients.

In the estimated correlation matrix, one would impose the constraint that b = i in order to test the hypothesis that the correlation between X and Y is the same as that between Z and Q. One could then test how well this estimated matrix fits the data using the standard output of any of the statistical packages mentioned above. If it turns out that the restricted correlation matrix provides a reasonable fit of the data given a previously specified level of statistical significance, then this finding would be equivalent to retaining the null hypothesis that the correlation between X and Y is the same as that between Z and Q. On the other hand, if the restricted correlation matrix does not fit the data well, then this would be equivalent to rejecting the null hypothesis. Provided that one is cognizant of the problem of making multiple inferences and the sample data conform to the assumptions necessary to perform a covariance structure analysis (e.g., sufficient sample size, joint multivariate normality of the population distribution of the input variables, etc.), one could test a series of hypotheses in this fashion given virtually any combination of variables.

The SAS program below demonstrates how PROC CALIS can be used to test the hypothesis that COV(X,Y) = COV(Z,Q). In the example below, the covariance rather than the correlation matrix is used since theoretically the maximum likelihood procedure for comfirmatory factor analysis is derived for covariance matrices. However, one may interpret results from this program as applying also to the correlation matrix for a set of variables.

** Example file to demonstrate test of equality of

correlation using PROC CALIS ** ;

** Create example data set**;

data a;

input X Y Z Q;

cards;

9.10 6.17 10.73 13.83
13.60 15.20 3.45 9.30
13.29 6.74 7.93 4.93
4.46 7.07 7.68 3.65
8.30 8.79 10.40 8.90
9.02 8.89 14.03 9.74
9.44 8.46 15.92 12.60
17.13 8.04 14.98 5.16
11.62 12.71 15.76 9.75
8.43 12.03 8.86 12.27
5.67 7.46 5.44 8.91
9.90 10.08 9.03 16.58
4.77 7.79 2.27 14.34
6.16 8.04 12.27 11.40
12.62 11.79 5.88 6.65
12.08 10.97 7.18 9.28
10.39 11.67 10.42 7.27
15.65 11.46 10.22 12.49
12.59 8.09 7.23 13.33
6.40 5.32 12.13 11.92
;
** Run correlation equality test ** ;

proc calis data = a covariance summary ;
var X Y Z Q ;
STD
X = v1,
Y = v2,
Z = v3,
Q = v4;

/* specify hypothesized covariance matrix where the covariance between X and Y
equals the covariance between Z & Q */
COV
X Y = cov_v1v2,
Z Q = cov_v1v2,
X Z = cov_v1v3,
Y Q = cov_v2v4,
X Q = cov_v1v4,
Y Z = cov_v2v3;
run ;
** End sample program ** ;

The key element of the program can be found in the COV statement which specifies the estimated correlation matrix for the null hypothesis. In order to impose the constraint that the COV(X,Y) = COV(Z,Q), the name "cov_v1v2" is given to both the covariance between X and Y and between Z and Q in the COV statement. The other covariances, on the other hand, will be freely estimated because each has been given a unique name. The relevant output from this program can be found below.

The CALIS Procedure
Covariance Structure Analysis: Maximum Likelihood Estimation

Fit Function 0.0638
Goodness of Fit Index (GFI) 0.9710
GFI Adjusted for Degrees of Freedom (AGFI) 0.7104
Root Mean Square Residual (RMR) 1.2634
Parsimonious GFI (Mulaik, 1989) 0.1618
Chi-Square 1.2124
Chi-Square DF 1
Pr > Chi-Square 0.2709
Independence Model Chi-Square 7.0050
Independence Model Chi-Square DF 6
RMSEA Estimate 0.1057
RMSEA 90% Lower Confidence Limit .
RMSEA 90% Upper Confidence Limit 0.6298
ECVI Estimate 1.3495
ECVI 90% Lower Confidence Limit .
ECVI 90% Upper Confidence Limit 1.8356
Probability of Close Fit 0.2822
Bentler's Comparative Fit Index 0.7887
Normal Theory Reweighted LS Chi-Square 1.1334
Akaike's Information Criterion -0.7876
Bozdogan's (1987) CAIC -2.7833
Schwarz's Bayesian Criterion -1.7833
McDonald's (1989) Centrality 0.9947
Bentler & Bonett's (1980) Non-normed Index -0.2680
Bentler & Bonett's (1980) NFI 0.8269
James, Mulaik, & Brett (1982) Parsimonious NFI 0.1378
Z-Test of Wilson & Hilferty (1931) 0.6121
Bollen (1986) Normed Index Rho1 -0.0385
Bollen (1988) Non-normed Index Delta2 0.9646
Hoelter's (1983) Critical N 62

In the table shown above, the chi-square results for the test of the null hypothesis that COV(X,Y) = COV(Z, Q) are indented for emphasis. Most of the other results are general goodness of fit measures that do not apply to this limited use of structural covariance models, so they may be ignored. The results of the chi-square test on this particular set of data indicate that we should retain the null hypothesis (p = .27). If we had observed a much lower p-value in this test (e.g. < .05) then we would have rejected the null hypothesis and concluded COV(X,Y) does not equal COV(Z, Q).

Note that this method allows you to test the equality of multiple sets of correlation coefficients within the same matrix simultaneously. For instance, if you had a correlation matrix consisting of 10 variables, you could easily test the v1-v2 with v3-v4 correlation equality at the same time you tested the v5-v6 and v7-v8 correlation equality. The resulting chi-square test statistic would have two degrees of freedom; it would test the joint hypothesis that the v1-v2 correlation is equal to the v3-v4 correlation and that the v5-v6 correlation is equal to the v7-v8 correlation.

References

Cohen, Jacob & Patrica Cohen (1983) Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences. Hillsdale, N.J. : Lawrence ErlbaumAssociates. (Second Edition)

Steiger, J.H. (1980) Tests for Comparing Elements of a Correlation Matrix. Psychological Bulletin.

#### Question:

I am performing a number of statistical tests on my dataset. I would like to control the type 1 error, the decision to reject the null hypothesis when it is, in fact, true. I understand that when I perform many hypothesis tests on the same set of data, the probability of making a type 1 error can increase from the conventional .05. I have heard about something called the Bonferroni adjustment that can fix this problem. How does it work?

The Bonferroni adjustment works by making it more difficult for any one test to be statistically significant. It works by dividing your alpha level (usually set to .05 by convention) by the number of tests you're performing. For instance, suppose you performed five tests on the same database. The Bonferroni adjusted level of significance any one test would need to obtain statistical significance would be:

.05 / 5 = .01

Any test that results in a probability value of less than .01 would be statistically significant. Any test statistic with a probability value greater than .01 (including values that fall between .01 and .05) would be deemed non-significant.
Some authors (e.g., Jaccard & Wan, 1996) have pointed out that this method of controlling type 1 error becomes very conservative, perhaps too conservative, when the number of comparisons grows large. Jaccard and Wan (1996, p.30) suggest the use of a modified Bonferroni procedure that still retains an overall type 1 error rate of 5% (alpha = .05). The modified Bonferroni procedure works as follows: Rank order the significance values obtained from your multiple tests from smallest to largest. Tied significance values may be ordered by theoretical criteria or arbitrarily. Evaluate the significance of the test with the smallest p-value at alpha / number of tests, just as you would in the Bonferroni procedure discussed above. If the test statistic result is statistically significant after this adjustment has been performed, move on to the test results from the test with the next smallest significance value. Evaluate this test statistic at alpha / (number of tests - 1). If this test statistic is significant after the adjustment, proceed to the third smallest significance value and evaluate it at alpha / (number of tests - 2). Proceed in this fashion until a non-significant test statistic result is obtained.

An example may help clarify the procedure. The table below shows for five hypothetical tests the test number, obtained significance, the original alpha, the divisor which you would divide into the original alpha to obtain the new alpha, and the evaluation of the test's statistical significance.

 Test Obtained Significance Original Alpha Divisor New Alpha Significant? 1 .001 .05 5 .010 Yes 2 .012 .05 4 .013 Yes 3 .019 .05 3 .017 No 4 .022 .05 2 .025 No 5 .048 .05 1 .050 No

Notice that test 1 would be significant under either Bonferroni adjustment method, but test 2 is significant only under the modified Bonferroni method. Test 3 is not significant under either method. Even though Test 4's obtained significance value is less than the modified Bonferroni alpha, test 4 is also not significant because of its requirement that all tests after the first non-significant test are also non-significant.

References

Jaccard, J. & Wan, C. K. (1996). LISREL approaches to interaction effects in multiple regression. Thousand Oaks, CA: Sage Publications.

Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandanavian Journal of Statistics, 6: 65-70.

Holland, B. S., and Copenhaver, M. (1988). Improved Bonferroni-type multiple testing procedures. Psychological Bulletin 104: 145-149.

### Handling non-normal data in structural equation modeling (SEM)

#### Question:

I am having trouble getting my hypothesized structural equation model to fit my data. Someone told me that non-normal data are a problem for SEM models; this person suggested using the generalized least-squares (GLS) estimator to fit my model instead of the default maximum likelihood (ML) estimator. What is the best way to handle non-normal data when fitting a structural equation model?

The hypothesis tests conducted in the structural equation modeling (SEM) context fall into two broad classes: tests of overall model fit and tests of significance of individual parameter estimate values. Both types of tests assume that the fitted structural equation model is true and that the data used to test the model arise from a joint multivariate normal distribution (JMVN) in the population from which you drew your sample data. If your sample data are not JMVN distributed, the chi-square test statistic of overall model fit will be inflated and the standard errors used to test the significance of individual parameter estimates will be deflated. Practically, this means that if you have non-normal data, you are more likely to reject models that may not be false and decide that particular parameter estimates are statistically significantly different from zero when in fact this is not the case (type 1 error). Note that this type of assumption violation is also a problem for confirmatory factor analysis models, latent growth models (LGMs), path analyses, or any other type of model that is fit using structural equation modeling programs such as LISREL, EQS, AMOS, and PROC CALIS in SAS.

How can you correct for non-normal data in SEM programs? There are three general approaches used to handle non-normal data:

1. Use a different estimator (e.g., GLS) to compute goodness of fit tests, parameter estimates, and standard errors

2. Adjust or scale the obtained chi-square test statistic and standard errors to take into account the non-normality of the sample data

3. Make use of the bootstrap to compute a new critical chi-square value, parameter estimates, and standard errors

Estimators

Most SEM software packages offer the data analyst the opportunity to use generalized least-squares (GLS) instead of the default maximum likelihood (ML) to compute the overall model fit chi-square test, parameter estimates, and standard errors. Under joint multivariate normality, when the fitted model is not false, GLS and ML return identical chi-square model fit values, parameter estimates, and standard errors (Bollen, 1989). Recent research by Ulf H. Olsson and his colleagues, however, (e.g., Olsson, Troye, & Howell, 1999) suggests that GLS underperforms relative to ML in the following key areas:

1. GLS accepts incorrect models more often than ML

2. GLS returns inaccurate parameter estimates more often than ML

A consequence of (2) is that modification indices are less reliable when the GLS estimator is used. Thus, we do not recommend the use of the GLS estimator.

A second option is to use Browne's (1984) Asymptotic Distribution Free (ADF) estimator, available in LISREL. Unfortunately, the use of ADF requires sample sizes that exceed at least 1000 cases and small models due to the computational requirements of the estimation procedure. As Muthén (1993) concludes, "Apparently the asymptotic properties of ADF are not realized for the type of models and finite sample sizes often used in practice. The method is also computationally heavy with many variables. This means that while ADF analysis may be theoretically optimal, it is not a practical method" (p. 227).

For these reasons, the standard recommendation is to use the ML estimator (or one of the variants described below) when fitting a model to data that are drawn from a population with variables that are assumed to be normally and continuously distributed in the population from which you drew your sample. By contrast, if your variables are inherently categorical in nature, consider using a software package designed specifically for this type of data. Mplus is one such product. It uses a variant of the ADF method mentioned previously, weighted-least squares (WLS). WLS as implemented by Mplus for categorical outcomes does not require the same sample sizes as does ADF for continuous, non-normal data. Further discussion of the WLS estimator is beyond the scope of this FAQ; interested readers are encouraged to peruse Muthén, du Toit, and Spisic (1997) and Muthén (1993) for further details.

Robust scaled and adjusted Chi-square tests and parameter estimate standard errors

A variant of the ML estimation approach is to correct the model fit chi-square test statistic and standard errors of individual parameter estimates. This approach was introduced by Satorra and Bentler (1988) and incorporated into the EQS program as the ml,robust option. The ml,robust option in EQS 5.x provides the Satorra-Bentler scaled chi-square statistic, also known as the scaled T statistic that tests overall model fit. Curran, West, and Finch (1996) found that the scaled chi-square statistic outperformed the standard ML estimator under non-normal data conditions. Mplus also offers the scaled chi-square test and accompanying robust standard errors via the estimator option mlm. Mplus also offers a similar test statistic called the Mean and Variance adjusted chi-square statistic via the estimator option mlmv.

An adjusted version of the scaled chi-square statistic is presented in Bentler and Dudgeon (1996). Fouladi (1998) conducted an extensive simulation study that found that this adjusted chi-square test statistic outperformed both the standard ML chi-square and the original scaled chi-square test statistic, particularly in smaller samples. Unfortunately, the adjusted test statistic is not available in EQS 5.x.

The robust approaches work by adjusting, usually downward, the obtained model fit chi-square statistic based on the amount of non-normality in the sample data. The larger the multivariate kurtosis of the input data, the stronger the applied adjustment to the chi-square test statistic. Standard errors for parameter estimates are adjusted upwards in much the same manner to reduce appropriately the type 1 error rate for individual parameter estimate tests. Although the parameter estimate values themselves are the same as those from a standard ML solution, the standard errors are adjusted (typically upward), with the end result being a more appropriate hypothesis test that the parameter estimate is zero in the population from which the sample was drawn.

Bootstrapping

The robust scaling approach described above adjusts the obtained chi-square model fit statistic based on the amount of multivariate kurtosis in the sample data. An alternative method to deal with non-normal input data is to not adjust the obtained chi-square test statistic and instead adjust the critical value of the chi-square test. Under the assumption of JMVN and if the fitted model is not false, the expected value of the chi-square test of model fit is equal to the model's degrees of freedom (DF). For example, if you fit a model that was known to be true and the input data were JMVN and the model had 20 DF, you would expect the chi-square test of model fit to be 20, on average. On the other hand, non-normality in the sample data can inflate the obtained chi-square to a value that exceeds DF, say 30. The robust scaled and adjusted chi-square tests mentioned in the previous section work by lowering the value of the obtained chi-square to correct for non-normality. For instance, in this example a reasonable value for the robust scaled or adjusted chi-square might be 25 instead of 30. Ideally, the adjusted chi-square would be closer to 20, but the adjustments are not perfect.

Bootstrapping works by computing a new critical value of the chi-square test of overall model fit by computing a new critical chi-square value. In our example, instead of the JMVN expected chi-square value of 20, a critical value generated via the bootstrap might be 27. The original obtained chi-square statistic for the fitted model (e.g., 30) is then compared to the bootstrap critical value (e.g., 27) rather than the original model DF value (e.g., 20). A p-value based upon the comparison of the obtained chi-square value to the bootstrap-generated critical chi-square value is then computed.

How is the bootstrap critical chi-square value generated? First, the input data is assumed to be the total population of responses and the bootstrap program draws samples, with replacement, of size N from this pseudo-population repeatedly. For each drawn sample, the input data are transformed to assume that your fitted model is true. This step is necessary because the critical chi-square value is computed from a central chi-square distribution; a central chi-square distribution assumes the null hypothesis is not false. The same assumption is made when you use the standard ML chi-square to test model fit: the obtained chi-square is equal to the model DF when the null hypothesis is not rejected.

Next, the model is fit to the data and the obtained chi-square is output and saved. This process is repeated across each of the bootstrap samples. At the conclusion of the bootstrap sampling, the bootstrap program collects the chi-square model fit statistics from each sample and computes their mean value. This mean value becomes the critical value for the chi-square test from the original analysis.

The procedure detailed above is credited to Bollen and Stine (1993) and is implemented in AMOS. AMOS allows the data analyst to specify the number of bootstrap samples drawn (typically 250 to 2000 bootstrap samples) and it outputs the distribution of the chi-square values from the bootstrap samples as well as the mean chi-square value and a Bollen-Stine p-value based upon a comparison of the original model's obtained chi-square with the mean chi-square from the bootstrap samples.

AMOS also computes individual parameter estimates, standard errors, confidence intervals, and p-values for tests of significance of individual parameter estimates based upon various types of bootstrap methods such as bias-correction and percentile-correction. Mooney and Duval (1993) and Davison and Hinkley (1997) describe these methods and their properties whereas Efron and Tibshirani (1993) provide an introduction to the bootstrap. Fouladi (1998) found in a simulation study that the Bollen-Stine test of overall model fit performed well relative to other methods of testing model fit, particularly in small samples.

Cautions and notes

One of the corollary benefits of the bootstrap is the ability to obtain standard errors and therefore p-values for quantities for which normal theory standard errors are not defined, such as r-square statistics. A primary disadvantage of the bootstrap and the robust methods mentioned previously is that they require complete data (i.e., no missing data are allowed). Use of the bootstrap method requires the data analyst to set the scale of latent variables by fixing a latent variable's value to 1.00 rather than by fixing the corresponding factor's variance value to 1.00, because under the latter scenario, bootstrapped standard error estimates may be artificially inflated by switching positive and negative factor loadings across bootstrap samples (Hancock & Nevitt, 1999).

References

Bentler, P. M., & Dudgeon, P. (1996). Covariance structure analysis: Statistical practice, theory, and directions. Annual Review of Psychology, 47, 563-592.

Bollen, K. A. (1989). Structural equations with latent variables. New York, NY: John Wiley and Sons.

Bollen, K. A., & Stine, R. A. (1993). Bootstrapping goodness-of-fit measures in structural equation models. In K. A. Bollen and J. S. Long (Eds.) Testing structural equation models. Newbury Park, CA: Sage Publications.

Browne, M. W. (1984). Asymptotically distribution-free methods for the analysis of covariance structures. British Journal of Mathematical and Statistical Psychology, 37, 62-83.

Curran, P. J., West, S. G., & Finch, J. F. (1996). The robustness of test statistics to nonnormality and specification error in confirmatory factor analysis. Psychological Methods, 1, 16-29.

Davison, A. C., & Hinkley, D. V. (1997). Bootstrapping methods and their application. Cambridge, UK: Cambridge University Press.

Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. New York, NY: Chapman and Hall Publishers.

Fouladi, R. T. (1998). Covariance structure analysis techniques under conditions of multivariate normality and nonnormality - Modified and bootstrap test statistics. Paper presented at the American Educational Research Association Annual Meeting, April 11-17, 1998, San Diego, CA.

Hancock, G. R., & Nevitt, J. (1999). Bootstrapping and the identification of exogenous latent variables within structural equation models. Structural Equation Modeling, 6(4), 394-399.

Mooney, C. Z., & Duval, R. D. (1993). Bootstrapping: A nonparametric approach to statistical inference. Newbury Park, CA: Sage Publications.

Muthén, B. O. (1993). Goodness of fit with categorical and other nonnormal variables. In K. A. Bollen and J. S. Long (Eds.) Testing structural equation models. Newbury Park, CA: Sage Publications.

Muthén, B. O., du Toit, S. H. C., & Spisic, D. (In press). Robust inference using weighted-least squares and quadratic estimating equations in latent variable modeling with categorical and continuous outcomes. Psychometrika.

Olsson, U. H., Troye, S. V., & Howell, R. D. (1999). Theoretic fit and empirical fit: The performance of maximum likelihood versus generalized least squares estimation in structural equation models. Multivariate Behavioral Research, 34(1), 31-59.

Satorra, A., & Bentler, P. M. (1988). Scaling corrections for chi-square statistics in covariance structure analysis. 1988 Proceedings of the Business and Economics Statistics Section of the American Statistical Association, 308-313.

Seaman, M. A., Levin, K. R., and Serlin, R. C. (1991). New developments in pairwise multiple comparisons: Some powerful and practicable procedures. Psychological Bulletin 110: 577-586.

### Connecting to published statistical and mathematical applications on the ITS Windows Terminal Server

#### Question:

Someone told me that I could gain access to a number of different statistical and mathematical packages on one of the ITS server computers. Is this true? If so, how much does it cost to use the server and how do I set up my computer to connect to the server?

Click HERE for information on how to access statistical and mathematical applications on the ITS servers.

### Connecting to the Unix Timesharing Server

#### Question:

How do I connect to the Unix Timesharing Server?