Operating System and Stats App Server
 Available Macintosh Statistical Software
 Connecting to published statistical and mathematical applications on the Stat Apps Server
 Connecting to the Unix Timesharing Server
 Transferring files from or to the Stat Apps Server
Sampling
Regression, ANOVA, ANCOVA
 Relationship between F and Rsquare
 DF for a correlation text (H0: rho=0)
 Point biserial correlation
 Standard error of the measurement
 Negative factor loadings
 When covariates are not helpful
 Testing multivariate skewness and kurtosis
 Centering variables prior to computing interaction terms for a multiple regression analysis
 Simple main effects tests
 Contrast coding
 How to perform pairwise comparisons of sample correlation coefficients
 How to compare correlation coefficients from the same sample
 Adjusted Bonferroni comparisons
Data Management
Advanced Methods (SEM, Multilevel Models, Factor Analysis)
 Estimation methods in structural equation modeling
 Computing explained variance in factor analysis
 Eigenvalues less than 1.0
 Number of factors from a factor analysis
 Multilevel models
 What is a good kappa coefficient?
 Handling nonnormal data in structural equation modeling (SEM)
Packages for the Macintosh 2
Question:
I would like information on the statistical packages available for the Macintosh operating system.
Answer:
Although there are many statistical packages available for the Macintosh operating system, we supply and support only SPSS. Software Distribution Services provides a listing of all software currently distributed, as well as specific information concerning these statistical packages, at http://www.utexas.edu/its/products.
The following statistical software packages are available for the Macintosh operating system as of 2012 and may be bought directly from the vendor:
EQS 6 for Mac  http://www.mvsoft.com/products.htm
JMP 9.0 for Mac  http://www.jmp.com
R  http://www.rproject.org/
SAS 6.12  http://www.sas.com/contact/ OR 18007270025
SPSS  http://www.spss.com/
Stata  http://www.stata.com/
The following packages are not available for the Macintosh as of 2012:
HLM  http://www.ssicentral.com/
LISREL  http://www.ssicentral.com/lisrel/
Minitab  http://www.minitab.com/
MPlus  http://www.statmodel.com/
SAS versions later than 6.12  http://www.sas.com/
SUDAAN  http://www.rti.org/sudaan/
SPlus  http://www.insightful.com/products/splus/
Relationship between F and Rsquare
Question:
How can I express Rsquare in terms of F?
Answer:
R^2 = df1*F / (df1*F + df2) where F is distributed as F(df1,df2).
To see this, let SST be the total (corrected) sum of squares, let SSR be the sum of squares from the regression model (which must contain df1 predictors in addition to the mean), and let the error sum of squares be SSE = SST  SSR. Then R^2 = SSR / SST and F = (SSR/df1) / (SSE/df2), and the stated relationship can be obtained with a little algebra.
Similarly, F = (df2/df1) * R^2 / (1R^2).
Data codebooks
Question:
What is a data codebook and why would I want to use one?
Answer:
A codebook is a key that defines how data will be entered into a computer file. It is also useful later, when you want to analyze the data and need to tell a software package how to read the data file.
There are at least eight things to be concerned with:
1. What will the variable names be on the computer? These must usually be limited to 8 characters, the first being a letter.
2. What are the labels, if any, to be associated with each variable? These clarify variable names that are often too brief to be understandable.
3. Does the variable contain only numeric values (does it contain any character values)?
4. Does the variable contain any missing values? How will these be coded?
5. What labels, if any, will be assigned to values to clarify what those values represent (e.g., 1='male', 2='female')?
6. What are the maximum number of columns needed to accurately represent a variable, including decimal places and negative signs?
7. What is the field or column location to be assigned to each variable?
8. Is more than one row (usually 80 columns) of data required per case?
An Example of a Codebook for a Simple Data File
Imagine that a 20question survey is given to individuals regarding insurance attitudes. The codebook book should contain the following information about how the survey data was entered into a text file.
Variable 
Width 
Columns 
Variable Label 
Value Labels 
ssn 
9 
19 
soc sec number 
All fields: 9=missing 
age 
3 
1113 
subject age 

sex 
1 
15 
subject sex 
1=male, 2=female 
quest1 
1 
20 
has insurance 
(All questions: 
quest20 
1 
60 
wants more 
1=no, 2=yes) 
Having completed a codebook, you're ready to enter data into a computer file using this format. The actual data file would then look something like this:
451335322 
29 
1 
1 
9 
1 
1 
2 
2 
2 
1 
2 
1 
2 
2 
1 
2 
9 
1 
1 
2 
1 
1 
354009564 
67 
2 
1 
2 
1 
1 
1 
2 
2 
2 
2 
1 
2 
1 
1 
1 
1 
2 
1 
1 
2 
2 
Back to top
DF for a correlation test (H0: rho=0)
Question:
Why does the test of a correlation (Ho: rho = 0) have N2 degrees of freedom instead of N1 degrees of freedom? There's only one correlation being estimated instead of two, so why are two degrees of freedom used?
Answer:
Remember that estimating the correlation coefficient is a special case of using the simple linear regression model. This regression model takes the form:
y = a + bx + e
where a (the intercept), and b (the slope) are the two parameters in the model to be estimated. Since two values are being estimated, two degrees of freedom are lost.
Point biserial correlation
Question:
I need to compute point biserial correlations for some data. However, I cannot find a procedure in any of the major stats packages that does this.
Answer:
The point biserial correlation is just the Pearson correlation with one of the variables being dichotomous. A special formula exists, but its purpose is to ease the burden of those who have to do the calculations by hand.
So, on a computer, just use the Pearson correlation procedure:
In SAS, use: PROC CORR.
In SPSS, use: CORRELATIONS.
Estimation methods in structural equation modeling
Question:
What are the advantages and disadvantages of using a maximum likelihood estimation method vs. a least squares estimation method in structural equation modeling?
Answer:
Monte Carlo simulation studies have shown that under ideal sampling conditions the three most common estimation methods (maximum likelihood, generalized least squares, and ordinary least squares) all yield comparable and very good parameter estimates.
However, under lessthanideal sampling conditions, each method has its own strengths and weaknesses. For example, when the assumption of joint multivariate normality is violated, maximum likelihood estimation tends to yield nonoptimal solutions, especially when the sample size falls below N = 200.
In general, for effective structural equation modeling, the total sample size should be at least 200, and at least three manifest variables should be included for each latent variable.
Each lessthanideal sampling situation presents a unique set of difficulties. You may want to contact a consultant by email (click HERE for more info) if you believe that your sample is less than ideal. Also, Latent Variable Models, by J. C. Loehlin, 1987, pp. 5460, contains more information on this topic.
Finite population correction factor
Question:
I'm sampling from a finite population. I've heard that in such cases the usual variance estimate can be too large. Is there some sort of correction factor?
Answer:
If the sample size, n, is greater than 5% of the population size, N, you will benefit by using the finite population correction factor. For the variance adjustment, multiply the original variance value by (Nn)/N.
For additional information, see Sampling Techniques by W.G Cochran.
Comments/Codebook in an external data file
Question:
I have an external data file that I would like to read into a statistical software package, preferably SAS or SPSS. I've included a codebook at the top of the data file. How can I tell SAS or SPSS to start reading the data after skipping the first n lines of the data file?
Answer:
SPSS can perform this task with either an Excel or text file.
For Excel, open the external data file. Uncheck the box that says “Read variable names from the first row of data.” In the box labeled “Range”, specify the first and last cell of the Excel spreadsheet to be read.
For example, A5:H35 tells SPSS to begin reading data in the first column, fifth row, continuing to read data by row until reaching the cell in the eighth column, thirtyfifth row. This will eliminate the first four rows from the SPSS dataset.
Unfortunately, variable names cannot be read into SPSS using this method; they must be manually entered in the SPSS dataset.
For a text file, open the external data file. A Text Import Wizard dialog box will appear. Follow the prompts as explained below:
Step 1 – No action is necessary; click next.
Step 2 – Specify the delimiter(s) that separate the data into columns and indicate if variable names are in the first row of data. If variable names are in the top row of the text file, SPSS will use these names in the new dataset and still allow you to begin reading data from a specified line.
Step 3  Designate the line number corresponding to the first case. The “Data preview” box at the bottom of the dialog box shows the first few lines of the dataset as a check that SPSS is reading the data correctly.
Step 4 – Indicate how the data are separated.
Step 5 – Name and format variables.
Step 6 – The file and syntax can be saved.
In SAS, use the FIRSTOBS= option in the INFILE statement. This option tells SAS which line of the infile to start reading data from.
For example, use the following syntax to begin reading data on line 21 of the external data file RAW.DAT located in the TEMP subdirectory of your C: disk drive:
INFILE 'c:\temp\raw.dat' FIRSTOBS = 21 ;
For more information on the infile statement in SAS, use the online SAS manual at http://support.sas.com/documentation/onlinedoc/base/index.html. Go to the SAS OnlineDoc under Base SAS 9.1.3 Procedures Guide and click the Index tab. You can then search for infile.
Standard error of the measurement
Question:
What is the standard error of the measurement?
Answer:
The standard error is the standard deviation of the sampling distribution of a statistic.
For example, suppose you are estimating the mean height of the population of eastern white pines. You select a sample of 100 trees, measure their height, and calculate a mean. Any given sample mean will be a function of the population mean, AND the random unique characteristics of the individual trees in the sample. Thus, if I were to take another sample of 100 trees, that mean would be a little different, and so would the mean of a third sample, and so on. If I calculated means for a very large number of samples of the same size, this sample of sample means would themselves have a mean value and a standard deviation. The mean of this "sampling distribution" would be the population mean, and the standard deviation is the standard error of the measurement. In this case the measurement is the mean, but it can be any sample statistic. The standard error tells us how much we can expect any given sample statistic to deviate from the population parameter we are estimating.
Just like a sample standard deviation from our tree example above tells us how much we can expect each tree to deviate from the mean of its sample, the standard error tells us how much we can expect any given statistic to deviate from its sampling mean, and remember, the mean of the sampling distribution is the actual population parameter value. The standard errror thus allows us to create confidence intervals and test hypotheses at a specified level of uncertainty, (e.g., 95% percent confidence, alpha: p<0.05, that sort of thing).
The problem is that we never actually collect a large number of samples, but often only one. So we have to estimate the standard error. The formulas for the estimate of the standard error can be simple or complex, but fortunately, there are computer programs to do this for us.
A good reference for this topic is Hays, W.L, (1981). Statistics, Third Edition. New York: Holt, Reinhart & Winston. See Chapter 5, Sampling Distributions and Point Estimation.
Negative factor loadings
Question:
What do negative factor loadings signify?
Answer:
A factor loading is the standardized regression coefficient for a factor in the multiple regression equation regressing the variable on the factors. Thus if the factor structure is orthogonal, then the loading is just the correlation between a variable and a factor.
So for an orthogonal set of factors, a negative loading (for a variable on a factor) indicates that scores on the factor tend to be associated with variable scores of the opposite sign.
When covariates are not helpful
Question:
I am doing a repeated measures analysis with covariates. I tested the significance of association between the dependent variables and covariates, and also the homogeneity of regression hyperplanes for the covariates. I believe that I have appropriate covariates to work with, but the problem is that the significance level (both MANOVA Wilks and univariate) is decreased by the covariates. Is this possible?
Answer:
Including a covariate in a model moves one degree of freedom from the error term to the model term. If the covariate does not increase the model sum of squares enough to compensate, then the Fratio will decrease, and so will the pvalue (significance level).
You may have an example of the case where the model sum of squares is not increased by a covariate because the covariate and the other predictor variables share the variance that predicts the dependent variable.
Computing explained variance in factor analysis
Question:
I ran a factor analysis on five variables and derived an orthogonal twofactor solution. Now I want to see what proportion of the total variance is explained by these two factors. How can I compute this figure?
Answer:
This proportion of variance explained by each factor is printed by default by most statistical software packages. Note that if the factor extraction method is not Principal Components, this proportion can be negative.
If you need to compute this value yourself, you can do so by summing the eigenvalues of the (two) factors of interest and dividing this number by the sum of all (five) eigenvalues.
For more information about factor analysis using SAS, use the online SAS manual at http://support.sas.com/documentation/onlinedoc/base/index.html. Go to the SAS OnlineDoc under Base SAS 9.1.3 Procedures Guide, Second Edition and click the Index tab. Jump to factor procedure.
For more information about factor analysis using SPSS, use the online Help  Case studies section in SPSS.
Eigenvalues less than 1.00
Question:
I read that the reason an eigenvalue greater than 1.0 is used as a criterion in factor analysis extractions is that if the eigenvalue is less than 1.0, then the variable is explaining less variance than a single item.
My question is this: in the course of a higherorder factor analysis, does the same rationale for using the 1.0 criterion pertain, i.e., if the eigenvalue is less than one, is less variance explained by it than by a single lowerorder factor?
Answer:
The short answer to your question is "Yes". That is, the rationale for only retaining factors with eigenvalues larger than one holds for a higher order factor analysis just as for a lower order one.
One way of thinking about this rule of thumb is to realize that your p variables form a pdimensional space. You want to rotate the axes of this space so that the new axes maximize the variance of the data points as they are projected onto the axes. The (normalized) eigenvectors of a matrix give the direction cosines determining the rotation, while the eigenvalues give the variance associated with each new axis.
When the matrix is a pxp correlation matrix, the variance of each variable is already standardized to 1, so things are particularly simple. An eigenvalue less than one represents a shrinking of an axis' importance in the new universe.
Similarly, the Spectral Decomposition Theory says that any matrix of rank p can be broken down into the sum of p component matrices. These component matrices are just the outer product of each eigenvector (xx'), weighted by its eigenvalue. Again, a pxp correlation matrix has rank p, and p eigenvalues summing to p. So the factor associated with an eigenvalue of less than 1.0 is not pulling its own weight.
Number of factors from a factor analysis
Question:
How can I decide how many factors I should extract from a factor analysis solution?
Answer:
There are a number of methods you can use, either individually or in concert to aid you in selecting the number of factors to retain from a factor analysis. Among them are:
1. The eigenvalue greater than or equal to 1.00 rule
Only factors with eigenvalues greater than or equal to 1.00 are retained, since one way to view this situation is that only factors with eigenvalues greater than or equal to 1.00 "pull their own weight" in explaining the common variance shared among your measures.
2. The scree plot
You can request that this be output from SPSS or SAS. You would retain the number of factors up to the "elbow"  1. For example, consider the following scree plot:
Eigenvalue
*

 *
 * * *

____________________
1 2 3 4 5 Factor Number
Here the "elbow" or bend is at factor 3, but you would retain 3  1 factors, or the first 2 factors.
3. Proportion of variance accounted for by factors
Decide a priori on how you wish to define the phrase 'a sufficient proportion of variance is accounted for', and retain only enough factors to cross that threshold.
4. The low error approach
Continue extracting factors until all residual values are 0.10 or lower.
5. Use a chisquare test
SAS and SPSS provide tests of overall goodnessoffit of the factor analysis model to the data when you choose maximumlikelihood (ML) or generalized leastsquares (GLS) factor extraction methods. If you choose to use one of these extraction methods (ML is generally more commonly used than GLS), you also must tell the software package how many factors you expect to be present. It then uses that number of factors as its null hypothesis. That is, the null hypothesis of the chisquare test is that the factor analysis model fits the data. So, a nonsignificant model test is desirable, whereas a statistically significant chisquare test means that more factors are needed to account for the structure of your data.
You should recognize two important caveats in using the chisquare method to help you decide how many factors to retain. The first caveat is that these test statistics are computed under the assumption of joint multivariate normality. If your data do not meet this assumption, it may not be appropriate to use these chisquare tests. The second caveat is that these tests are very sensitive to sample size. A factor analysis model which otherwise fits the data well may be statistically significant due to a large sample. If you use only the chisquare results to determine the number of factors to retain, you will probably retain too many factors.
You can always use more than one of these methods to help you decide which solution is optimal, but, as always, theory should be your foremost consideration. SAS and SPSS anticipate that theory can guide your decision to extract a given number of factors, so each package provides a method to limit the number of factors extracted to be a specific number (e.g., NFACTORS=2 to extract two factors).
Also, there is the problem that "a person with one watch always knows what time it is; a person with more than one watch never knows the exact time." In other words, using the information from all of these methods may lead to a situation where they conflict, e.g., you retain only factors with eigenvalues greater than 1.00, but you have some residuals with values greater than 0.10.
In this type of situation, theory provides your first guideline, and the other rules of thumb can provide some additional guidance, but it is important not to follow any one of the rules of thumb by rote or too strictly but instead to evaluate the solution as a complete picture, including how it meshes with prior findings, your own theoretical models, etc.
Testing multivariate skewness and kurtosis
Question:
How can I use my sample's skewness and kurtosis to determine whether I have a multivariate normal distribution?
Answer:
There is a large amount of literature on this topic, although not much is yet implemented in SAS or SPSS.
In Multivariate Analysis, Part 1, Distributions, ordination, and inference, (1994) W.J. Krzanowski and F.H.C. Marriott review (section 3.16, p.58) tests of the null hypothesis that the data come from a multivariate normal distribution.
Mardia 1970 found that if the null hypothesis is true, then a simple function of the sample skewness has an asymptotic chisquared distribution, and the sample kurtosis has an asymptotic normal distribution. Sample sizes greater than 50 are needed for approximations to be acceptably accurate.
SAS's PROC CALIS will output several measures of univariate and multivariate skewness and kurtosis.
Also, the PRELIS2 program developed by Joreskog and Sorbom will test both univariate and multivariate normality simultaneously, including separate tests of skewness and kurtosis at both the univariate and multivariate level. See page 24 of the PRELIS2 manual (the section titled, "New features in PRELIS2").
Mardia's test of multivariate normality can also be found in EQS, Structural Equation Modeling Software.
AMOS also includes a test of multivariate normality. For detailed instructions on performing this test in AMOS, see the AMOS FAQ on handling nonnormal data.
Macros for both SPSS and SAS can be downloaded. Lawrence DeCarlo, Ph.D., provides an SPSS macro for Mardia's test of multivariate skewness and kurtosis at http://www.columbia.edu/~ld208/. A SAS macro is available in the SAS online manual at http://support.sas.com/. To find the macro, go to the Knowledge Base section and click on Samples and SAS Notes. Click on Search Samples, search for multnorm, and choose Macro to test multivariate normality. This site gives a downloadable version of the macro as well as instructions on how to use the macro.
Sample size for multiple regression
Question:
How many participants, cases, or data points do I need per predictor to ensure a stable solution in a multiple regression analysis?
Answer:
Unfortunately, there is no clear consensus on the exact answer to this question. We have heard and read answers ranging anywhere from 5 to 50 cases per predictor. Generally, the more cases per predictor you have, the better off you will be in terms of your ability to generalize your results to your population of interest. This becomes particularly true when your sample data violate one or more of the assumptions underlying regression analysis.
That said, James Stevens recommends a nominal number of 15 data points per predictor for multiple regression analyses (James Stevens: Applied Multivariate Statistics for the Social Sciences, Third Edition, Lawrence Erlbaum Publishers, p. 72).
Centering variables prior to computing interaction terms for a multiple regression analysis
Question:
I am predicting my dependent variable y from independent variables a and b. How can I calculate the interaction term a*b for use in my regression analysis?
Answer:
There are differing opinions about how to compute the interaction term for use in an analysis. The steps below will show you how to compute noncentered and centered interaction terms. Some researchers compute the product of a and b (without centering or altering the variables in any way) and enter this product into their regression model, like so:
For SPSS, use the dialog boxes to compute the new interaction variable:
In the Data View window, click Transform and then Compute.
In the Target Variable box, type the name of the new interaction variable, e.g. ab.
In the Numeric Expression box, enter a*b. Click OK.
This computes the interaction term, ab, and adds it to the dataset.
Enter the variables a, b, and ab as independent variables in the regression model.
For SAS:
DATA origdata; SET origdata; ab = a*b ; RUN;
PROC REG DATA = origdata; MODEL y = a b ab ; RUN ;
Other researchers advocate "centering" the a and b predictors before computing the interaction term. Centering the term means subtracting the variable's mean from each case's value on that variable. The result is known as a "deviation score." The SPSS and SAS code shown below can be used to create centered variables.
For SPSS:
In the Data View window, click Transform and then Compute.
Type the variable name, breakvar, in the Target Variable box. Enter a value of 1 in the Numeric
Expression box. Click OK.
This creates a new variable, breakvar, with a value equal to 1. This variable is necessary for calculating the means of variables a and b.
Click Data, then Aggregate.
Click breakvar into the box labeled Break Variables.
Click on a and b to put them into the box labeled Aggregated Variables.
Make sure the function specified in the Summaries of Variables box is the mean of the variable. Make sure the default option of add aggregated variables to active dataset is checked. Click OK.
This will add the mean of a and the mean of b as two new columns in the dataset, a_mean and b_mean, respectively.
Click Transform, then Compute.
Type the centered variable name, acen, in the Target Variable box .
Enter a  a_mean in the Numeric Expression box. Click OK. This creates the centered variable of a.
Create the centered variable, bcen, by entering b  b_mean in the Numeric Expression box. Click OK.
Click Transform, then Compute.
Type the variable name, abcen, in the Target Variable box .
Enter acen* bcen in the Numeric Expression box. Click OK.
This creates the interaction term, abcen, based on the centered variables of a and b.
Use the centered terms acen, bcen, and abcen in the regression model instead of a, b, and ab.
For SAS:
PROC STANDARD DATA = origdata
OUT = centdata
MEAN = 0
PRINT;
VAR a b;
RUN;
DATA centdata;
SET centdata;
ab = a*b; RUN:
PROC REG DATA = centdata;
MODEL y = a b ab; RUN;
The centered and noncentered approaches yield identical overall regression model statistics and tests for the interaction effect (assuming that the interaction effect is the last entered into the regression model, as is generally the case in this type of analysis).
Which approach should you use to compute your interaction term? The chief advantages of centering are that it (1) reduces multicollinearity (a high correlation) between the a and b predictors and the a*b interaction term and (2) can render more meaningful interpretations of the regression coefficients for a and b.
The regression coefficient for a*b will be the same for both approaches, but the coefficients for a and b will differ depending on which method you use. This is because in the noncentering method, the coefficient for a estimates the relationship between a and y where b equals zero. In the centering method, the coefficient for a estimates the relationship between a and y where b equals its average. In many situations, the predictors will not have a meaningful zero point, so a centering approach may be warranted.
Leona Aiken and Stephen West provide an example of this type of situation in their text titled Multiple regression: Testing and interpreting interactions (1991, Sage Publications, Newbury Park, Chapter 3).
As an example, suppose you are predicting athletes' strength levels (y) from height (a) and weight (b) measurements. Under the noncentering approach, the measure of the relationship between height (a) and strength (y) as estimated by the regression coefficient for height (a) occurs where b = 0, or weight equals zero pounds. No athlete we know of has a weight of zero pounds!
Centering provides one remedy to this situation: In the centered model, the regression coefficient for height (a) estimates the relationship between height (a) and strength (y) where weight (b) is equal to the mean weight in the data set instead of zero.
Aiken and West devote an entire chapter of their book to the topic of centering (chapter 3). This book is available from the PhysicsMathAstronomy library on campus. See http://catalog.lib.utexas.edu/.
Simple main effects tests
Question:
How can I carry out a simple main effects test using either SAS or SPSS?
Answer:
You can use either SAS or SPSS to conduct these tests. For example, let's say that you had a fairly straightforward completely betweensubjects design, with a dependent variable called Y and two categorical predictors, A and B, each with two levels. Suppose that you wanted to test the effect of A for each level of B.
In SAS release 6.11 or higher, you can use the SLICE option in PROC GLM, like this:
PROC GLM ;
CLASS a b ;
MODEL y = a b a*b ;
LSMEANS a*b / SLICE=b;
If you are using SPSS, you can use the MANOVA command to test the same hypotheses as the SAS program shown above:
MANOVA y BY a(1,2) b(1,2) /design = B A within b(1) A within b(2).
More complex designs are testable with the recent SPSS GLM and SAS MIXED procedures, including withinsubjects simple main effects tests.
Multilevel models
Question:
I'm analyzing a dataset with dyads (couples) and another dataset with families who have different numbers of children per family. Someone suggested analyzing my data using something called a multilevel model. What's a multilevel model and why should I use it?
Answer:
The data you describe are often referred to as "hierarchical" or "clustered" because subjects (individuals) are nested within clusters or units such as families or couples. Many commonlyused statistical procedures such as ordinary leastsquares linear regression assume that every observation is independent of every other observation in the dataset. Obviously, when clusters are present, this assumption is violated.
To address this problem, researchers developed special statistical models to take into account the hierarchical nature of such datasets. As a class, these models are known as multilevel models. Other investigators developed special software programs designed specifically for the analysis of multilevel models.
Among the general purpose software packages that we support, SAS is one of the most commonly used in handling multilevel models. The MIXED procedure can be used to analyze data with continuous distributions whereas the GENMOD procedure can be used for repeated measures with nonnormally distributed variables. An additional feature of MIXED that GENMOD lacks is the ability to estimate variances for the cluster level; this feature is useful for descriptive purposes as well as the computation of proportions of variance due to clusters versus individuals in the dataset.
To learn more about using PROC MIXED to fit multilevel models to normally distributed outcome variables, you can download a copy of the paper "Using SAS PROC MIXED to fit multilevel models, hierarchical models, and individual growth models" written by Judith Singer at Harvard University at http://gseweb.harvard.edu/~faculty/singer/. If your outcome variables are nonnormally distributed, consider using the GLIMMIX macro available from SAS Institute. GLIMMIX uses PROC MIXED as part of its syntax, so you can use GLIMMIX to obtain variance component estimates for clusters.
Other software packages that provide multilevel modeling analysis are SPSS, LISREL, HLM, MPlus, MLwiN, and AMOS. Details on these packages can be found under the FAQ section of the statistical website of David Garson, Ph.D.:
http://www2.chass.ncsu.edu/garson/pa765/multilevel.htm.
Links for the software are given below:
SPSS: http://www.spss.com/software/statistics/
LISREL and HLM: http://www.ssicentral.com/workshops/index.html
MPlus: http://www.statmodel.com/features4.shtml
MLwiN: http://www.cmm.bristol.ac.uk/
AMOS: http://www.spss.com/amos/
Multilevel model analysis is complex and a rapidly growing and changing field. To keep up to date, consider joining a Multilevel Internet Email list. Also, you can visit the HLM website at http://www.ssicentral.com/ or the Centre for Multilevel Modelling home pages at http://www.cmm.bristol.ac.uk/ to get the latest information about multilevel model workshops, software, and related resources.
Contrast coding
Question:
I've run a balanceddata ANOVA with one betweensubjects factor (GROUP) and one withinsubjects factor (TIME). Group has three levels; time has three levels. My dependent variable is anxiety, measured at three equallyspaced intervals.
I now want to run a contrast analysis. I want to compare group 1 to group 2 across all three measurement occasions of anxiety. How can I determine what my contrast weights should be?
Answer:
One widely used method is first to specify your hypothesis in terms of your design's cell means. You then reexpress the hypothesis in terms of the model parameters used by your software. Finally, you match the weights found in this expression to the syntax required by your software.
1. Specify your hypothesis in terms of your design's cell means.
To identify the population cell means in your GROUP by TIME study, it's helpful to use a table, like this:
Population Means 

Time 1 
Time 2 
Time 3 

Group 1 
Mu11 
Mu12 
Mu13 
Group 2 
Mu21 
Mu22 
Mu23 
Group 3 
Mu31 
Mu32 
Mu33 
Once you identify the cell means, the next step is to specify your null hypothesis as an equality among various combinations of these means.
Your hypothesis, stated in null hypothesis form, reads: "The population mean for group 1 equals the population mean for group 2, when this mean is taken across all measurement occasions of anxiety." Translating this natural language hypothesis into a statement about equality of population means, you get:
Mu11+Mu12+Mu13 = Mu21+Mu22+Mu23
2. Reexpress the hypothesis in terms of the model parameters.
For the standard balanceddata twoway ANOVA model, the relationship between a cell mean and the model parameters is widely known. In our case, this relationship for Mu11 is:
Mu11 = I+G1+T1+GT11
That is, each individual population mean is composed of an intercept term (abbreviated I), a main effect term due to group (abbreviated G), another main effect term due to Time (abbreviated T), and a group by time interaction term (abbreviated GT).
When we substitute these expressions into the null hypothesis formula shown above, we get:
(I+G1+T1+GT11)+(I+G1+T2+GT12)+(I+G1+T3+GT13)=(I+G2+T1+GT21)+(I+G2+T2+GT22)+(I+G2+T3+GT23)
While this formula may seem intimidating, it is easily simplified. There are three intercept terms (I) before the equals sign and three intercept terms after it, so all intercept terms drop out of the formula. The T1, T2, and T3 terms also drop out. This leaves us with:
(G1+GT11)+(G1+GT12)+(G1+GT13)=(G2+GT21)+(G2+GT22)+(G2+GT23)
We now continue simplifying by collecting terms. We have three G1 terms and three G2 terms, giving us:
3G1 + GT11 + GT12 + GT13 = 3G2 + GT21 + GT22 + GT23
Notice that what we have left is an equality between each group's main effect and group by time interaction terms. We're collapsing across our time variable, which agrees with our hypothesis. However, you may be surprised by the interaction terms, since our hypothesis doesn't explicitly mention them. We'll say more about this later.
3. Translate this expression into the form required by the software's syntax.
Most software requires the expression to equate to a constant (usually zero), so we subtract one side from each side to get:
3G1 + GT11 + GT12 + GT13  3G2  GT21  GT22  GT23 = 0
Then we need to arrange our terms in the order used by our software. For SAS or SPSS, the order in this case would be:
3G1  3G2 + GT11 + GT12 + GT13  GT21  GT22  GT23 = 0
Finally, we need to include a term for every parameter in a variable, unless each parameter in the variable has a weight of zero. Our equation becomes:
3G1  3G2 + 0G3 + GT11 + GT12 + GT13  GT21  GT22  GT23 + 0GT31 + 0GT32 + 0GT33 = 0
We can now read off the contrast weights, which are just the coefficients of the effect terms.
Although the exact specification of contrast statement varies from package to package, its general form is as follows:
"contrastname" variablename weights
where "contrastname" is a quoted string that identifies the contrast on the software's output, "variablename" is the name of the variable (e.g., GROUP), and "weights" are the contrast weights you've generated.
Let's put our contrast weights into this framework:
"my contrast" group 3 3 0 group*time 1 1 1 1 1 1 0 0 0
Contrast coding can be a challenging exercise. It is easy to produce contrast weights which do not test your hypothesis unless you follow a systematic method such as the one described here. Be sure to check carefully the contrast results produced by your software. These results should be consistent with the usual descriptive information (e.g., cell means, standard deviations, and standard errors) you should run before you perform a contrast analysis. If you are uncertain about the validity of your contrast results, contact a consultant at for assistance (click HERE for more info about consulting services).
For instance, you might have expected the interaction terms to have dropped out. This would be appropriate in a model employing the usual side conditions that the interaction terms within a level sum to zero. However, both SAS and SPSS use the "overparameterized" ANOVA model, which does not assume such restrictions.
For more information on generating and specifying contrast codes, see the online SAS manual at http://support.sas.com/documentation/. Under SAS Product Documentation, click on SAS/STAT. Click on SAS OnlineDoc under SAS/STAT 9.1.3; scroll down and click on SAS/STAT and then click on SAS/STAT User's Guide. Scroll down to The GLM Procedure; the Syntax section discusses the CONTRAST command.
Handling missing or incomplete data
Question:
I have a database that contains records with incomplete data; some research participants did not complete all of the available questions on my survey. How should I handle this problem?
Answer:
Missing or incomplete data are a serious problem in many fields of research. An added complication is that the more data that are missing in a database, the more likely it is that you will need to address the problem of incomplete cases, yet those are precisely the situations where imputing or filling in values for the missing data points is most questionable due to the small proportion of valid data points relative to the size of the data matrix. This FAQ highlights commonlyused methods of handling incomplete data problems. It discusses a number of their known strengths and weaknesses. At the end of the FAQ a software table is provided that compares and contrasts some commonlyused software options for handling missing data and details their availability to UT faculty, students, and staff.
When you choose a missing data handling approach, keep in mind that one of the desired outcomes is maintaining (or approximating as closely as possible) the shape of the original distribution of responses. Some incomplete data handling methods do a better job of maintaining the distributional shape than others. For instance, one popular method of imputation, mean substitution, can result in a distribution with truncated variance.
If you have questions about the advisability of applying a particular method to your own database, we recommend you schedule an appointment with a Statistical Services consultant to discuss these issues as they pertain to your own unique circumstances (note: This service is available to University of Texas faculty, staff, and students only). Missing data imputation and handling is a rapidly evolving field with many methods, each applicable in some circumstances but not others.
Types of missing data
The most appropriate way to handle missing or incomplete data will depend upon how data points became missing. Little and Rubin (1987) define three unique types of missing data mechanisms.
Missing Completely at Random (MCAR):
Cases with complete data are indistinguishable from cases with incomplete data. Heitjan (1997) provides an example of MCAR missing data: Imagine a research associate shuffling raw data sheets and arbitrarily discarding some of the sheets. Another example of MCAR missing data arises when investigators randomly assign research participants to complete twothirds of a survey instrument. Graham, Hofer, MacKinnon (1996) illustrate the use of planned missing data patterns of this type to gather responses to more survey items from fewer research participants than one ordinarily obtains from the standard survey completion paradigm in which every research participant receives and answers each survey question.
Missing at Random (MAR): Cases with incomplete data differ from cases with complete data, but the pattern of data missingness is traceable or predictable from other variables in the database rather than being due to the specific variable on which the data are missing. For example, if research participants with low selfesteem are less likely to return for followup sessions in a study that examines anxiety level over time as a function of selfesteem, and the researcher measures selfesteem at the initial session, selfesteem can then be used to predict the missingness pattern of the incomplete data. Another example is reading comprehension: Investigators can administer a reading comprehension test at the beginning of a survey administration session; research participants with lower reading comprehension scores may be less likely to complete the entire survey. In both of these examples, the actual variables where data are missing are not the cause of the incomplete data. Instead, the cause of the missing data is due to some other external influence.
Nonignorable: The pattern of data missingness is nonrandom and it is not predictable from other variables in the database. If a participant in a weightloss study does not attend a weighin due to concerns about his weight loss, his data are missing due to nonignorable factors. In contrast to the MAR situation outlined above where data missingness is explainable by other measured variables in a study, nonignorable missing data arise due to the data missingness pattern being explainable  and only explainable  by the very variable(s) on which the data are missing.
In practice it is usually difficult to meet the MCAR assumption. MAR is an assumption that is more often, but not always, tenable. The more relevant and related predictors one can include in statistical models, the more likely it is that the MAR assumption will be met.
Methods of handling missing data
Some of the more popular methods for handling missing data appear below. This list is not exhaustive, but it covers some of the more widely recognized approaches to handling databases with incomplete cases.
Listwise or casewise data deletion: If a record has missing data for any one variable used in a particular analysis, omit that entire record from the analysis. This approach is implemented as the default method of handling incomplete data by many statistical procedures in commonlyused statistical software packages such as SAS and SPSS.
Pairwise data deletion: For bivariate correlations or covariances, compute statistics based upon the available pairwise data. Pairwise data deletion is available in a number of SAS and SPSS statistical procedures.
Mean substitution: Substitute a variable’s mean value computed from available cases to fill in missing data values on the remaining cases. This option appears in several SPSS procedures. The Base module of SPSS also allows easy computation of new variables that contain mean substitution data values. In the Data Editor spreadsheet: Select Transform, then Replace Missing Values (Note: This function is not the same as that offered by the SPSS Missing Values Analysis addin module; the MVA module uses the EM approach described below). SAS allows mean substitution using the STANDARD procedure; see the SAS FAQs for details.
Regression methods: Develop a regression equation based on complete case data for a given variable, treating it as the outcome and using all other relevant variables as predictors. Then, for cases where Y is missing, plug the available data into the regression equation as predictors and substitute the equation’s predicted Y value into the database for use in other analyses. An improvement to this method involves adding uncertainty to the imputation of Y so that the mean response value is not always imputed.
Hot deck imputation: Identify the most similar case to the case with a missing value and substitute the most similar case’s Y value for the missing case’s Y value.
Expectation Maximization (EM) approach: An iterative procedure that proceeds in two discrete steps. First, in the expectation (E) step you compute the expected value of the complete data log likelihood. In the maximization (M) step you substitute the expected values for the missing data obtained from the E step and then maximize the likelihood fuction as if no data were missing to obtain new parameter estimates. The procedure iterates through these two steps until convergence is obtained. The SPSS Missing Values Analysis (MVA) module employs the EM approach to missing data handling.
Raw maximum likelihood methods: Use all available data to generate maximum likelihoodbased sufficient statistics. Usually these consist of a covariance matrix of the variables and a vector of means. This technique is also known as Full Information Maximum Likelihood (FIML).
Multiple imputation: Similar to the maximum likelihood method, except that multiple imputation generates actual raw data values suitable for filling in gaps in an existing database. Typically, five to ten databases are created in this fashion. The investigator then analyzes these data matrices using an appropriate statistical analysis method, treating these databases as if they were based on complete case data. The results from these analyses are then combined into a single summary finding.
Roth (1994) reviews these methods and concludes, as did Little & Rubin (1987) and Wothke (1998), that listwise, pairwise, and mean substitution missing data handling methods are inferior when compared with maximum likelihood based methods such as raw maximum likelihood or multiple imputation. Regression methods are somewhat better, but not as good as hot deck imputation or maximum likelihood approaches. The EM method falls somewhere in between: It is generally superior to listwise, pairwise, and mean substitution approaches, but it lacks the uncertainty component contained in the raw maximum likelihood and multiple imputation methods.
It is important to understand that these missing data handling methods and the discussion that follows deal with incomplete data primarily from the perspective of estimation of parameters and computation of test statistics rather than prediction of values for specific cases. Warren Sarle at SAS Institute has put together a helpful paper on the topic of missing data in the contexts of prediction and data mining. The paper can be found online in postscript form at ftp://ftp.sas.com/pub/neural/JCIS98.ps and in an html version.
Hot deck and maximum likelihoodbased approaches to handling missing data
Hot deck
Hot deck imputation fills in missing cells in a data matrix with the next most similar case's values. Consider the following example database.
Illustration of Hot Deck Imputation: Data Matrix with Incomplete Data
Case 
Item 1 
Item 2 
Item 3 
Item 4 
1 
4 
1 
2 
3 
2 
5 
4 
2 
5 
3 
3 
4 
2 
Case three has a missing data cell for item four. Hot deck imputation examines the cases with complete records (cases one and two in this example) and substitutes the value of the most similar case for the missing data point. In this example there are two complete cases to choose from: cases one and two. Case two is more similar to case three, the case with the missing data point, than in case one. Case two and case three have the same values for items two and three whereas case one and case three have the same value for item three only. Therefore, case two is more similar to case three than is case one. Note: There are different strategies for how to judge similarity.
Once the hot deck imputation determines which case among the observations with complete data is the most similar to the record with incomplete data, it substitutes the most similar complete case's value for the missing variable into the data matrix.
Illustration of Hot Deck Imputation: Data Matrix with Imputed Data
Case 
Item 1 
Item 2 
Item 3 
Item 4 
1 
4 
1 
2 
3 
2 
5 
4 
2 
5 
3 
3 
4 
2 
5 
Since case two had the value of five for item four, the hot deck procedure imputes a value of five for case three to replace the missing data cell. Data analysis may then proceed using the new complete database.
Hot deck imputation has a long history of use, including years of use by the United States Census Bureau. It can be superior to listwise deletion, pairwise deletion, and mean substitution approaches to handling missing data. Among hot deck's advantages are its conceptual simplicity, its maintenance of the proper measurement level of variables (categorical variables remain categorical and continuous variables remain continuous), and the availability of a complete data matrix at the end of the imputation process that can be analyzed like any complete data matrix. One of hot deck's disadvantages is the difficulty in defining "similarity"; there may be any number of ways to define what similarity is in this context. Thus, the hot deck procedure is not an "out of the box" approach to handling incomplete data. Instead it requires that you develop custom software syntax to perform the selection of donor cases and the subsequent imputation of missing values in your database. More sophisticated hot deck algorithms would identify more than one similar record and then randomly select one of those available donor records to impute the missing value or use an average value if that were appropriate.
Two examples of SAS macros used to perform hot deck imputation can be found online. John Stiller and Donald R. Dalzell (1998) wrote a paper titled "Hotdeck Imputation with SAS® Arrays and Macros for Large Surveys" which can be found at http://www2.sas.com/proceedings/sugi23/Stats/p246.pdf. Lawrence Altmayer from the U.S. Bureau of the Census wrote a paper "HotDeck Imputation: A Simple DATA Step Approach" which can be found at http://www8.sas.com/scholars/05/PREVIOUS/1999/pdf/075.pdf.
Expectation maximization (EM)
The expectation maximization (EM) approach to missing data handling is documented extensively in Little & Rubin (1987). The EM approach is an iterative procedure that proceeds in two discrete steps. First, in the expectation (E) step the procedure computes the expected value of the complete data log likelihood based upon the complete data cases and the algorithm's "best guess" as to what the sufficient statistical functions are for the missing data based upon the model specified and the existing data points; actual imputed values for the missing data points need not be generated. In the maximization (M) step it substitutes the expected values (typically means and covariances) for the missing data obtained from the E step and then maximizes the likelihood function as if no data were missing to obtain new parameter estimates. The new parameter estimates are substituted back into the E step and a new M step is performed. The procedure iterates through these two steps until convergence is obtained. Convergence occurs when the change of the parameter estimates from iteration to iteration becomes negligible.
The SPSS Missing Values Analysis (MVA) module employs the EM approach to missing data handling. The strength of the approach is that it has wellknown statistical properties and it generally outperforms popular ad hoc methods of incomplete data handling such as listwise and pairwise data deletion and mean substitution because it assumes incomplete cases have data missing at random (MAR) rather than missing completely at random (MCAR). The primary disadvantage of the EM approach is that it adds no uncertainty component to the estimated data. Practically speaking, this means that while parameter estimates based upon the EM approach are reliable, standard errors and associated test statistics (e.g., ttests) are not. This shortcoming led statisticians to develop two newer likelihoodbased methods for handling missing data, the raw maximum likelihood approach and multiple imputation.
Raw maximum likelihood
Raw maximum likelihood, also known as Full Information Maximum Likelihood (FIML), methods use all available data points in a database to construct the best possible first and second order moment estimates under the MAR assumption. Put less technically, if the missing at random (MAR) assumption can be met, maximum likelihoodbased methods can generate a vector of means and a covariance matrix among the variables in a database that is superior to the vector of means and covariance matrix produced by commonlyused missing data handling methods such as listwise deletion, pairwise deletion, and mean substitution. See Wothke (1998) for a convincing demonstration.
Under an unrestricted mean and covariance structure, raw maximum likelihood and EM return identical parameter estimate values. Unlike EM, however, raw maximum likelihood can be employed in the context of fitting userspecified linear models, such as structural equation models, regression models, ANOVA and ANCOVA models, etc. Raw maximum likelihood also produces standard errors and parameter estimates under the assumption that the fitted model is not false, so parameter estimates and standard errors are modeldependent. That is, their values will depend upon the model chosen and fitted by the investigator.
Raw maximum likelihood missing data handling is currently implemented in the AMOS structural equation modeling package currently supported by ITS. The primary advantage of this method from a practical standpoint is that it is built in to the software package: the AMOS user simply clicks on a check box to enable missing data handling. The program then fits the analyst's model using the raw maximum likelihood missing data handling approach. Any general linear model including ANOVA, ANCOVA, MANOVA, MANCOVA, path analysis, confirmatory factor analysis, and numerous time series and longitudinal models can be fit using AMOS.
Other software packages that use the raw maximum likelihood approach to handle incomplete data are the MIXED procedure in SAS and SPSS (see the paper titled "Linear mixedeffects modeling in SPSS”) and Michael Neale's MX. The MIXED procedure can fit ANOVA, ANCOVA, and repeated measures models with timeconstant and timevarying covariates. You should strongly consider using a MIXED procedure instead of SAS PROC GLM or the SPSS General Linear Models (GLM) procedures whenever you have repeated measures data with missing data points. The MIXED procedures can also fit hierarchical linear models (HLMs), also known as multilevel or random coefficient models. MX is a freeware structural equation modeling program.
Raw maximum likelihood has the advantage of convenience/ease of use and wellknown statistical properties. Unlike EM, it also allows for the direct computation of appropriate standard errors and test statistics. Disadvantages include an assumption of joint multivariate normality of the variables used in the analysis and the lack of a raw data matrix produced by the analysis. Recall that the raw maximum likelihood method only produces a covariance matrix and a vector of means for the variables; the statistical software then uses these as imputes for further analyses.
Raw maximum likelihood methods are also modelbased. That is, they are implemented as part of a fitted statistical model. The investigator may want to include relevant variables (e.g., reading comprehension) that will improve the accuracy of parameter estimates, but not include these variables in the statistical model as predictors or outcomes. While it is possible to do this, it is not always easy or convenient, particularly in large or complex models.
Finally, raw maximum likelihood assumes the incomplete data cells are missing at random. Wothke (1998) suggests, however, that raw maximum likelihood can offer superior performance to listwise and pairwise deletion methods even in the nonignorable data situation.
Multiple imputation
Multiple imputation combines the wellknown statistical advantages of EM and raw maximum likelihood with the ability of hot deck imputation to provide a raw data matrix to analyze. Multiple imputation works by generating a maximum likelihoodbased covariance matrix and vector of means, like EM. Multiple imputation takes the process one step further by introducing statistical uncertainty into the model and using that uncertainty to emulate the natural variability among cases one encounters in a complete database. Multiple imputation then imputes actual data values to fill in the incomplete data points in the data matrix, just as hot deck imputation does.
The primary difference between multiple imputation and hot deck imputation from a practical or procedural standpoint is that multiple imputation requires that the data analyst generate five to ten databases with imputed values. The data analyst then analyzes each database, collects the results from the analyses, and summarizes them into one summary set of findings. For instance, suppose a researcher wishes to perform a multiple regression analysis on a database with incomplete data. The researcher would run multiple imputation, generate ten imputed databases, and run the multiple regression analysis on each of the ten databases. The researcher then combines the results from the ten regression analyses together into one summary for presentation, not necessarily a trivial task.
Multiple imputation has several advantages: It is fairly wellunderstood and robust to violations of nonnormality of the variables used in the analysis. Like hot deck imputation, it outputs complete raw data matrices. It is clearly superior to listwise, pairwise, and mean substitution methods of handling missing data in most cases. Disadvantages include the time intensiveness in imputing five to ten databases, testing models for each database separately, and recombining the model results into one summary. Furthermore, summary methods have been worked out for linear and logistic regression models, but work is still in progress to provide statistically appropriate summarization methods for other models such as factor analysis, structural equation models, multinomial logit regression models, etc.
Schafer (1997) thoroughly documents multiple imputation theory in a textbook. Schafer has also written the freeware PC program NORM to perform multiple imputation analysis. Another freeware program similar to NORM called Amelia may also be downloaded.
SAS users can use the procedures MI and MIANALYZE to perform multiple imputation and combine analyses from the imputed data sets. PROC MI computes the imputed data sets; the data analyst then uses a standard SAS procedure, such as REG, GLM, or MIXED to analyze each imputed data set; finally, MIANALYZE combines the output from the analyses on the imputed data sets and provides the overall results. These procedures are available in SAS version 9.
Patternmixture models for nonignorable missing data
All the methods of missing data handling considered above require that the data meet the Little & Rubin (1987) missing at random (MAR) assumption. There are circumstances, however, when this assumption cannot be met to a satisfactory degree; cases are considered missing due to nonignorable causes (Heitjan, 1997). In such instances the investigator may want to consider the use of a patternmixture model, a term used by Hedeker & Gibbons (1997). Earlier works dealing with patternmixture models include Little & Schenker (1995), Little (1993), and Glynn, Laird, & Rubin (1986).
Patternmixture models categorize the different patterns of missing values in a dataset into a predictor variable, and this predictor variable is incorporated into the statistical model of interest. The investigator can then determine if the missing data pattern has any predictive power in the model, either by itself (a main effect) or in conjunction with another predictor (an interaction effect).
The chief advantage of the patternmixture model is that it does not assume the incomplete data are missing at random (MAR) or missing completely at random (MCAR). The primary disadvantage of the patternmixture model approach is that it requires some custom programming on the part of the data analyst to obtain one part of the patternmixture analysis, the patternmixture averaged results. It is worth noting, however, that Hedeker & Gibbons (1997, Appendix) demonstrate that some results may be obtained by using the SAS MIXED procedure and they provide sample SAS/IML code to obtain patternmixture averaged results on their Web site. If the number of missing data patterns and the number of variables with missing data are large relative to the number of cases in the analysis, the model may not converge due to insufficient data to support the use of many main effect and interaction terms.
Conclusions
Although applied researchers cannot turn to a single "one size fits all" solution for handling incomplete data problems, several trends in the missing data analysis literature are worth noting. First, ad hoc and commonlyused methods of handling incomplete data such as listwise and pairwise deletion and mean substition are inferior to hot deck imputation, raw maximum likelihood, and multiple imputation methods in most situations. Second, software to perform hot deck, raw maximum likelihood, and multiple imputation is becoming more widely available and easier to use.
Although all of the methods described so far assume the incomplete data are missing at random, new statistical models are being developed to handle data missing due to nonignorable factors. Some of these models can be partially fit using familiar statistical packages and procedures such as the MIXED procedure in either SAS (e.g., Hedeker & Gibbons, 1997) or SPSS (see the paper titled "Linear mixedeffects modeling in SPSS”).
References
Glynn, R., Laird, N.M., & Rubin, D.B. (1986). Selection modeling versus mixture modeling with nonignorable nonresponse. In H. Wainer (ed.) Drawing Inferences from SelfSelected Samples, 119146. New York: SpringerVerlag.
Graham, J.W., Hofer, S.M., & MacKinnon, D.P. (1996). Maximizing the usefulness of data obtained with planned missing value patterns: An application of maximum likelihood procedures. Multivariate Behavioral Research, 31(2), 197218.
Hedeker, D. & Gibbons, R.D. (1997). Application of randomeffects patternmixture models for missing data in longitudinal studies. Psychological Methods, 2(1), 6478.
Heitjan, D.F. (1997). Annotation: What can be done about missing data? Approaches to imputation. American Journal of Public Health, 87(4), 548550.
Iannacchione, V. G. (1982). Weighted sequential hot deck imputation macros. Proceedings of the SAS Users Group International Conference, 7, 759 763.
Little, R.J.A. (1993). Patternmixture models for multivariate incomplete data. Journal of the American Statistical Association, 88, 125124.
Little, R.J.A., & Schenker, N. (1995). Missing Data. In Arminger, Clogg, & Sobel (eds.) Handbook of Statistical Modeling for the Social and Behavioral Sciences. New York: Plenum.
Little, R.J.A. & Rubin, D.A. (1987). Statistical analysis with missing data. New York: John Wiley and Sons.
Roth, P. (1994). Missing data: A conceptual review for applied psychologists. Personnel Psychology, 47, 537560.
SAS Institute Inc. 2004. SAS/STAT 9.1 User's Guide. Cary, NC: SAS Institute Inc.
Schafer, J.L. (1997) Analysis of Incomplete Multivariate Data. Book number 72 in the Chapman & Hall series Monographs on Statistics and Applied Probability. London: Chapman & Hall.
Wothke, W (1998). Longitudinal and multigroup modeling with missing data. In T.D. Little, K.U. Schnabel, & J. Baumert (Eds.) Modeling longitudinal and multiple group data: Practical issues, applied approaches and specific examples. Mahwah, NJ: Lawrence Erlbaum Associates.
Software Table
The table below specifies several commonlyused software options for handling missing or incomplete data. The table is not intended to be an exhaustive list of every possible missing datahandling software package. However, if you discover or know of another software option you have used successfully, please let us know by sending an email to stat.admin@austin.utexas.edu
The table lists the name of the software, the method of handling incomplete data, assumptions it makes about the causes of missing data, whether the package is supported at UT Austin, pricing and availability to UT faculty, students, and staff, and miscellaneous comments generally dealing with the perceived ease of use of the package from the perspective of computing novices. Note that in addition to the assumptions about the origins of incomplete data, many of the methods shown below also contain other tacit assumptions (e.g., joint multivariate normality of variables included in the analysis).
Missing Data Handling Software Options
Software Name 
Method 
Assumptions 
UT supported? 
Pricing and Availability 
Comments 
Amelia 
Multiple Imputation 
Data are missing at random (MAR) 
Yes (limited) 
Free to download from the Amelia Web site. 
Easy to intermediate difficulty of use. 
SAS Base (e.g., PROC STANDARD) 
Mean substitution 
Data are missing completely at random (MCAR) 
Yes 
Available on timesharing systems and leasing from Software Distribution & Sales. 
Easy to use, but only advisable when the number of missing data points is very small (i.e., < 5%) 
SAS/STAT Multiple Imputation Procedures 
Multiple Imputation 
Data are missing at random (MAR) 
Yes 
Available on timesharing systems and leasing from Software Distribution & Sales. 
Not easy for novices to use. 
SAS/IML Multiple Imputation Programs 
Multiple Imputation 
Data are missing at random (MAR) 
Yes (limited) 
Programs require SAS to function. Programs can be downloaded for free. 
Not easy for novices to use. 
Paul Allison's SAS Macro Multiple Imputation Programs 
Multiple Imputation 
Data are missing at random (MAR) 
Yes (limited) 
Programs require SAS to function. Programs can be downloaded for free. 
Not easy for novices to use. 
SAS EM_COVAR.SAS EM program 
EM with bootstrapping option for covariance matrices 
Data are missing at random (MAR) 
Yes (limited) 
Programs require SAS to function. Program can be downloaded for free. 
Not easy for novices to use 
SPSS Base 
Mean substitution 
Data are missing completely at random (MCAR) 
Yes 
Available on timesharing systems and leasing from ITS Software Distribution & Sales. 
Easy to use, but only advisable when the number of missing data points is very small (i.e., < 5%). 
SPSS Missing Values Analysis (MVA) addin module 
EM 
Data are missing at random (MAR) 
Yes (limited) 
Not currently available from ITS; see the SPSS, Inc. Web site for direct ordering information. 
Easy to use. Parameter estimates are unbiased, but standard errors and teststatistics are not. (See important review by Paul Hipple) 
AMOS 
Raw maximum likelihood 
Data are missing at random (MAR); the model you specify is not false or incorrect 
Yes 
Available on the ITS Windows terminal server. 
Easy to use. Parameter estimates, standard errors, and global fit statistics are correct. 
MX 
Raw maximum likelihood 
Data are missing at random (MAR); the model you specify is not false or incorrect 
No 
Free to download from the MX Web site. 
Not easy to use, though a new graphical front end may make access easier for novice users. 
NORM 
Multiple Imputation 
Data are missing at random (MAR) 
Yes (very limited) 
Free to download from the NORM Web site. 
Intermediate level of difficulty of use. Improved help system in the latest version helps you walk through analyses. 
SOLAS 
Multiple Imputation Hot Deck Regression 
Data are missing at random (MAR) or missing completely at random (MCAR) depending upon the technique chosen by the analyst 
No 
Not currently available from ITS. See the Solas Web site for direct ordering information. 
Menudriven interface appears fairly easy to use. 
SAS/IML patternmixture model programs 
Pattern mixture model approach 
No assumptions are made about the data missingness mechanism 
Yes (limited) 
Free to download from Hedeker & Gibbons' Web site. 
Not easy to use in its entirety. 
Back to top
How to perform pairwise comparisons of sample correlation coefficients
Question:
I have two correlation coefficients computed from two different samples. Is there a test that I can perform that will allow me to determine whether the two coefficients are significantly different?
Answer:
Yes there is. In order to perform such a test, one must first transform both of the sample correlation coefficients using the Fisher rtoZ transformation, given by the rule
Z = ½ln(1+ rxy/1 rxy)
where rxy = the sample correlation coefficient for variables X and Y. Also, ln = log base e (i.e., the natural logarithm). Hence, the Fisher rtoZ transformation involves a logarithmic transformation of the sample correlation coefficients.
The test statistic is then
(Z1 – Z2) / var(Z1 – Z2)
where Z1 equals the transformed value of the first sample, Z2 the transformed value of the second, and
var(Z1 – Z2) = sqrt(1/(N1 – 3) + 1/(N2 – 3))
According to Hays' Statistics, (1988, p. 591), "For reasonably large samples (say, 10 in each), this ratio can be referred to the standard normal distribution." If the absolute value of the statistic exceeds 1.96, then one can reject (at the .05 level, two tailed) the null hypothesis that the two correlation coefficients come from populations with the same "true" level of correlation among X and Y. However, in order for this test to be valid, the samples must be independent and the population represented by each must be approximately normal.
The condition of independence would certainly not hold if the two samples involved the same subject (e.g. repeated measures) or matched subjects. For data of this kind, consult General FAQ "How to compare correlation coefficients from the same sample".
SAS 9.1 can compute correlation coefficients using Fisher's rtoZ transformation and provide confidence intervals and pvalues for these coefficients. Details are provided in the SAS online manual at http://support.sas.com/onlinedoc/913/; choose the Index tab and jump to CORR .
For older versions of SAS, the SAS macro, compcorr, will compute the Fisher's rtoZ transformation, giving the test statistic, pvalue, and confidence intervals; the macro is located in http://support.sas.com/kb/24/995.html.
SPSS users can use the following syntax.
* Fisher r to Z testing program.
* Compares correlations from two independent samples.
* See Hays (1988), p. 591.
** Begin sample program.
* Enter correlations into an SPSS database.
DATA LIST free
/corr1 corr2.
BEGIN DATA.
.50 .35 END DATA.
* Define the sample sizes of each group.
COMPUTE n1 = 25.
COMPUTE n2 = 25.
* Convert r values to Z values.
COMPUTE z1 = .5*LN((1+corr1)/(1corr1)).
COMPUTE z2 = .5*LN((1+corr2)/(1corr2)).
* Compute the estimated standard error.
COMPUTE stderr = sqrt((1/(n13))+(1/(n23))).
* Compute the final Z value. Evaluate this value.
* against a standard normal distribution for.
* statistical significance.
COMPUTE ztest = (z1z2)/stderr.
COMPUTE p_1_tail = 1 CDF.NORMAL(abs(ztest),0,1).
COMPUTE p_2_tail = (1  CDF.NORMAL(abs(ztest),0,1))*2.
* Print the results.
LIST.
** End sample program.
In the example shown above, the SPSS user inputs two sample correlation coefficients, .50 and .35. He then inputs the sizes of the samples  each has 25 cases. The program then computes the appropriate Z test for the equality of the two correlations. It outputs a onetailed test of the correlations' equality (represented by the p_1_tail variable) as well as a twotailed test of the same equality (represented by the p_2_tail variable).
What is a good kappa coefficient?
Question:
I have computed Cohen's kappa to assess agreement among raters, corrected for chance agreement. What is a reasonable kappa level? What are good and poor values of kappa?
Answer:
The information that follows was derived from posts by Judith Saebel and Scott McNary on the Structural Equation Modeling LISTSERV email discussion group on August 19, 1999.
Although there are no absolute cutoffs for kappa coefficients, two sources provide some rough guidelines for the interpretation of kappa coefficients. According to J. L. Fleiss (1981), p. 218, values exceeding .75 suggest strong agreement above chance, values in the range of .40 to .75 indicate fair levels of agreement above chance, and values below .40 are indicative of poor agreement above chance levels.
A journal article by Landis & Koch, p. 159, suggests the following kappa interpretation scale may be useful:
Kappa Value 
Interpretation 
Below 0.00 
Poor 
0.000.20 
Slight 
0.210.40 
Fair 
0.410.60 
Moderate 
0.610.80 
Substantial 
0.811.00 
Almost perfect 
In addition, Gardner (1995) recommends that kappa exceed .70 before you proceed with additional data analyses.
References
Fleiss, J. L. (1981). Statistical Methods for Rates and Proportions. New York: John Wiley & Sons. (Second Edition).
Gardner, W. (1995). On the reliability of sequential data: measurement, meaning, and correction. In John M. Gottman (Ed.),
The analysis of change. Mahwah, N.J.: Erlbaum.
Landis, J. & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, (33), 159174.
If you have further questions, send Email to stat.admin@austin.utexas.edu This email address is being protected from spambots. You need JavaScript enabled to view it .
How to compare sample correlation coefficients drawn from the same sample
Question:
I would like to compare two sample correlation coefficients, but they are drawn from the same sample. Is there a method of testing for a significant difference between them that takes their dependence into account?
Note: If you have two correlation coefficients computed from two different samples, please consult General FAQ "How to perform pairwise comparisons of sample correlation coefficients".
Answer:
Yes there is, and it involves a choice between two different methods. The first method relies on a formula found in Cohen & Cohen (1983) on page 57. The formula yields a tstatistic with n  3 degrees of freedom. As written below, the formula tests for a significant difference in the correlation between variables X & Y and V & Y:
t = (rxy  rvy)*sqrt((n1)(1 + rxv))/(sqrt(2((n1)/(n3))R + ((rxy + rvy)/2)^2(1rxv)^3))
where
rxy = correlation coefficient between variables x and y
rxv = correlation coefficient between variables x and v
ryv = correlation coefficient between variables y and v
and R = (1  rxy ^2  rvy^2  rxv^2 + (2*rxy*rxv*rvy)), the determinant of the correlation matrix for X, Y, and V.
Unfortunately, the above method is not available as an option in any of the statistical procedures in either SPSS or SAS. However, SPSS users can adapt the following syntax to perform the test:
* Dependent Correlation Comparison Program.
* Compares correlation coefficients from the same sample.
* See Cohen & Cohen (1983), p. 57.
DATA LIST free
/rxy rvy rxv.
BEGIN DATA.
.50 .32 .65
END DATA.
* Define the sample size.
COMPUTE n =50.
COMPUTE diffr = rxy  rvy.
COMPUTE detR = (1  rxy **2  rvy**2  rxv**2)+ (2*rxy*rxv*rvy).
*Calculate (rxy + rvy)^2 .
COMPUTE rbar = (rxy + rvy)/2.
* Calculate numerator of t statistic.
COMPUTE tnum = (diffr)
* (sqrt((n1)*(1 + rxv))).
COMPUTE tden = sqrt(2*((n1)/(n3))*detR + ((rbar**2) * ((1rxv)**3))).
COMPUTE t= (tnum/tden).
COMPUTE df = n  3.
* Evaluate the value of the t statistic.
* against a t distribution with n  3 degrees if freedom for.
* statistical significance.
COMPUTE p_1_tail = 1  CDF.T(abs(t),df).
COMPUTE p_2_tail = (1  CDF.T(abs(t),df))*2.
EXECUTE.
The above syntax will generate an active dataset that will appear in the data editor window. In the Variable View, change the number of decimals to the desired setting to display appropriate results.
Notice that the method above is limited to the three variable case (e.g, X & Y, and V & Y). The second method represents a more flexible approach to the problem. With this method, one would use a statistical software package capable of estimating covariance structural models (e.g. SAS, AMOS, LISREL) to compare an observed correlation matrix to an estimated correlation matrix that includes restrictions that represent a null hypothesis. Steiger (1980) discusses this approach in more detail.
For example, suppose you have a set of four variables: X, Y, Z, and Q, and you want to test whether the correlation between X and Y is the same as that between Z and Q. The observed correlation matrix would be symmetric and look like the one presented below,
X 
Y 
Z 
Q 

X 
a 
b 
c 
c 
Y 
b 
e 
f 
g 
Z 
c 
f 
h 
i 
Q 
d 
g 
i 
j 
where capital letters represent variables and lower case letters represent correlation coefficients.
In the estimated correlation matrix, one would impose the constraint that b = i in order to test the hypothesis that the correlation between X and Y is the same as that between Z and Q. One could then test how well this estimated matrix fits the data using the standard output of any of the statistical packages mentioned above. If it turns out that the restricted correlation matrix provides a reasonable fit of the data given a previously specified level of statistical significance, then this finding would be equivalent to retaining the null hypothesis that the correlation between X and Y is the same as that between Z and Q. On the other hand, if the restricted correlation matrix does not fit the data well, then this would be equivalent to rejecting the null hypothesis. Provided that one is cognizant of the problem of making multiple inferences and the sample data conform to the assumptions necessary to perform a covariance structure analysis (e.g., sufficient sample size, joint multivariate normality of the population distribution of the input variables, etc.), one could test a series of hypotheses in this fashion given virtually any combination of variables.
The SAS program below demonstrates how PROC CALIS can be used to test the hypothesis that COV(X,Y) = COV(Z,Q). In the example below, the covariance rather than the correlation matrix is used since theoretically the maximum likelihood procedure for comfirmatory factor analysis is derived for covariance matrices. However, one may interpret results from this program as applying also to the correlation matrix for a set of variables.
** Example file to demonstrate test of equality of
correlation using PROC CALIS ** ;
** Create example data set**;
data a;
input X Y Z Q;
cards;
9.10 6.17 10.73 13.83
13.60 15.20 3.45 9.30
13.29 6.74 7.93 4.93
4.46 7.07 7.68 3.65
8.30 8.79 10.40 8.90
9.02 8.89 14.03 9.74
9.44 8.46 15.92 12.60
17.13 8.04 14.98 5.16
11.62 12.71 15.76 9.75
8.43 12.03 8.86 12.27
5.67 7.46 5.44 8.91
9.90 10.08 9.03 16.58
4.77 7.79 2.27 14.34
6.16 8.04 12.27 11.40
12.62 11.79 5.88 6.65
12.08 10.97 7.18 9.28
10.39 11.67 10.42 7.27
15.65 11.46 10.22 12.49
12.59 8.09 7.23 13.33
6.40 5.32 12.13 11.92
;
** Run correlation equality test ** ;
proc calis data = a covariance summary ;
var X Y Z Q ;
STD
X = v1,
Y = v2,
Z = v3,
Q = v4;
/* specify hypothesized covariance matrix where the covariance between X and Y
equals the covariance between Z & Q */
COV
X Y = cov_v1v2,
Z Q = cov_v1v2,
X Z = cov_v1v3,
Y Q = cov_v2v4,
X Q = cov_v1v4,
Y Z = cov_v2v3;
run ;
** End sample program ** ;
The key element of the program can be found in the COV statement which specifies the estimated correlation matrix for the null hypothesis. In order to impose the constraint that the COV(X,Y) = COV(Z,Q), the name "cov_v1v2" is given to both the covariance between X and Y and between Z and Q in the COV statement. The other covariances, on the other hand, will be freely estimated because each has been given a unique name. The relevant output from this program can be found below.
The CALIS Procedure
Covariance Structure Analysis: Maximum Likelihood Estimation
Fit Function 0.0638
Goodness of Fit Index (GFI) 0.9710
GFI Adjusted for Degrees of Freedom (AGFI) 0.7104
Root Mean Square Residual (RMR) 1.2634
Parsimonious GFI (Mulaik, 1989) 0.1618
ChiSquare 1.2124
ChiSquare DF 1
Pr > ChiSquare 0.2709
Independence Model ChiSquare 7.0050
Independence Model ChiSquare DF 6
RMSEA Estimate 0.1057
RMSEA 90% Lower Confidence Limit .
RMSEA 90% Upper Confidence Limit 0.6298
ECVI Estimate 1.3495
ECVI 90% Lower Confidence Limit .
ECVI 90% Upper Confidence Limit 1.8356
Probability of Close Fit 0.2822
Bentler's Comparative Fit Index 0.7887
Normal Theory Reweighted LS ChiSquare 1.1334
Akaike's Information Criterion 0.7876
Bozdogan's (1987) CAIC 2.7833
Schwarz's Bayesian Criterion 1.7833
McDonald's (1989) Centrality 0.9947
Bentler & Bonett's (1980) Nonnormed Index 0.2680
Bentler & Bonett's (1980) NFI 0.8269
James, Mulaik, & Brett (1982) Parsimonious NFI 0.1378
ZTest of Wilson & Hilferty (1931) 0.6121
Bollen (1986) Normed Index Rho1 0.0385
Bollen (1988) Nonnormed Index Delta2 0.9646
Hoelter's (1983) Critical N 62
In the table shown above, the chisquare results for the test of the null hypothesis that COV(X,Y) = COV(Z, Q) are indented for emphasis. Most of the other results are general goodness of fit measures that do not apply to this limited use of structural covariance models, so they may be ignored. The results of the chisquare test on this particular set of data indicate that we should retain the null hypothesis (p = .27). If we had observed a much lower pvalue in this test (e.g. < .05) then we would have rejected the null hypothesis and concluded COV(X,Y) does not equal COV(Z, Q).
Note that this method allows you to test the equality of multiple sets of correlation coefficients within the same matrix simultaneously. For instance, if you had a correlation matrix consisting of 10 variables, you could easily test the v1v2 with v3v4 correlation equality at the same time you tested the v5v6 and v7v8 correlation equality. The resulting chisquare test statistic would have two degrees of freedom; it would test the joint hypothesis that the v1v2 correlation is equal to the v3v4 correlation and that the v5v6 correlation is equal to the v7v8 correlation.
References
Cohen, Jacob & Patrica Cohen (1983) Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences. Hillsdale, N.J. : Lawrence ErlbaumAssociates. (Second Edition)
Steiger, J.H. (1980) Tests for Comparing Elements of a Correlation Matrix. Psychological Bulletin.
Adjusted Bonferroni Comparisons
Question:
I am performing a number of statistical tests on my dataset. I would like to control the type 1 error, the decision to reject the null hypothesis when it is, in fact, true. I understand that when I perform many hypothesis tests on the same set of data, the probability of making a type 1 error can increase from the conventional .05. I have heard about something called the Bonferroni adjustment that can fix this problem. How does it work?
Answer:
The Bonferroni adjustment works by making it more difficult for any one test to be statistically significant. It works by dividing your alpha level (usually set to .05 by convention) by the number of tests you're performing. For instance, suppose you performed five tests on the same database. The Bonferroni adjusted level of significance any one test would need to obtain statistical significance would be:
.05 / 5 = .01
Any test that results in a probability value of less than .01 would be statistically significant. Any test statistic with a probability value greater than .01 (including values that fall between .01 and .05) would be deemed nonsignificant.
Some authors (e.g., Jaccard & Wan, 1996) have pointed out that this method of controlling type 1 error becomes very conservative, perhaps too conservative, when the number of comparisons grows large. Jaccard and Wan (1996, p.30) suggest the use of a modified Bonferroni procedure that still retains an overall type 1 error rate of 5% (alpha = .05). The modified Bonferroni procedure works as follows: Rank order the significance values obtained from your multiple tests from smallest to largest. Tied significance values may be ordered by theoretical criteria or arbitrarily. Evaluate the significance of the test with the smallest pvalue at alpha / number of tests, just as you would in the Bonferroni procedure discussed above. If the test statistic result is statistically significant after this adjustment has been performed, move on to the test results from the test with the next smallest significance value. Evaluate this test statistic at alpha / (number of tests  1). If this test statistic is significant after the adjustment, proceed to the third smallest significance value and evaluate it at alpha / (number of tests  2). Proceed in this fashion until a nonsignificant test statistic result is obtained.
An example may help clarify the procedure. The table below shows for five hypothetical tests the test number, obtained significance, the original alpha, the divisor which you would divide into the original alpha to obtain the new alpha, and the evaluation of the test's statistical significance.
Test 
Obtained Significance 
Original Alpha 
Divisor 
New Alpha 
Significant? 
1 
.001 
.05 
5 
.010 
Yes 
2 
.012 
.05 
4 
.013 
Yes 
3 
.019 
.05 
3 
.017 
No 
4 
.022 
.05 
2 
.025 
No 
5 
.048 
.05 
1 
.050 
No 
Notice that test 1 would be significant under either Bonferroni adjustment method, but test 2 is significant only under the modified Bonferroni method. Test 3 is not significant under either method. Even though Test 4's obtained significance value is less than the modified Bonferroni alpha, test 4 is also not significant because of its requirement that all tests after the first nonsignificant test are also nonsignificant.
References
Jaccard, J. & Wan, C. K. (1996). LISREL approaches to interaction effects in multiple regression. Thousand Oaks, CA: Sage Publications.
Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandanavian Journal of Statistics, 6: 6570.
Holland, B. S., and Copenhaver, M. (1988). Improved Bonferronitype multiple testing procedures. Psychological Bulletin 104: 145149.
Handling nonnormal data in structural equation modeling (SEM)
Question:
I am having trouble getting my hypothesized structural equation model to fit my data. Someone told me that nonnormal data are a problem for SEM models; this person suggested using the generalized leastsquares (GLS) estimator to fit my model instead of the default maximum likelihood (ML) estimator. What is the best way to handle nonnormal data when fitting a structural equation model?
Answer:
The hypothesis tests conducted in the structural equation modeling (SEM) context fall into two broad classes: tests of overall model fit and tests of significance of individual parameter estimate values. Both types of tests assume that the fitted structural equation model is true and that the data used to test the model arise from a joint multivariate normal distribution (JMVN) in the population from which you drew your sample data. If your sample data are not JMVN distributed, the chisquare test statistic of overall model fit will be inflated and the standard errors used to test the significance of individual parameter estimates will be deflated. Practically, this means that if you have nonnormal data, you are more likely to reject models that may not be false and decide that particular parameter estimates are statistically significantly different from zero when in fact this is not the case (type 1 error). Note that this type of assumption violation is also a problem for confirmatory factor analysis models, latent growth models (LGMs), path analyses, or any other type of model that is fit using structural equation modeling programs such as LISREL, EQS, AMOS, and PROC CALIS in SAS.
How can you correct for nonnormal data in SEM programs? There are three general approaches used to handle nonnormal data:
1. Use a different estimator (e.g., GLS) to compute goodness of fit tests, parameter estimates, and standard errors
2. Adjust or scale the obtained chisquare test statistic and standard errors to take into account the nonnormality of the sample data
3. Make use of the bootstrap to compute a new critical chisquare value, parameter estimates, and standard errors
Estimators
Most SEM software packages offer the data analyst the opportunity to use generalized leastsquares (GLS) instead of the default maximum likelihood (ML) to compute the overall model fit chisquare test, parameter estimates, and standard errors. Under joint multivariate normality, when the fitted model is not false, GLS and ML return identical chisquare model fit values, parameter estimates, and standard errors (Bollen, 1989). Recent research by Ulf H. Olsson and his colleagues, however, (e.g., Olsson, Troye, & Howell, 1999) suggests that GLS underperforms relative to ML in the following key areas:
1. GLS accepts incorrect models more often than ML
2. GLS returns inaccurate parameter estimates more often than ML
A consequence of (2) is that modification indices are less reliable when the GLS estimator is used. Thus, we do not recommend the use of the GLS estimator.
A second option is to use Browne's (1984) Asymptotic Distribution Free (ADF) estimator, available in LISREL. Unfortunately, the use of ADF requires sample sizes that exceed at least 1000 cases and small models due to the computational requirements of the estimation procedure. As Muthén (1993) concludes, "Apparently the asymptotic properties of ADF are not realized for the type of models and finite sample sizes often used in practice. The method is also computationally heavy with many variables. This means that while ADF analysis may be theoretically optimal, it is not a practical method" (p. 227).
For these reasons, the standard recommendation is to use the ML estimator (or one of the variants described below) when fitting a model to data that are drawn from a population with variables that are assumed to be normally and continuously distributed in the population from which you drew your sample. By contrast, if your variables are inherently categorical in nature, consider using a software package designed specifically for this type of data. Mplus is one such product. It uses a variant of the ADF method mentioned previously, weightedleast squares (WLS). WLS as implemented by Mplus for categorical outcomes does not require the same sample sizes as does ADF for continuous, nonnormal data. Further discussion of the WLS estimator is beyond the scope of this FAQ; interested readers are encouraged to peruse Muthén, du Toit, and Spisic (1997) and Muthén (1993) for further details.
Robust scaled and adjusted Chisquare tests and parameter estimate standard errors
A variant of the ML estimation approach is to correct the model fit chisquare test statistic and standard errors of individual parameter estimates. This approach was introduced by Satorra and Bentler (1988) and incorporated into the EQS program as the ml,robust option. The ml,robust option in EQS 5.x provides the SatorraBentler scaled chisquare statistic, also known as the scaled T statistic that tests overall model fit. Curran, West, and Finch (1996) found that the scaled chisquare statistic outperformed the standard ML estimator under nonnormal data conditions. Mplus also offers the scaled chisquare test and accompanying robust standard errors via the estimator option mlm. Mplus also offers a similar test statistic called the Mean and Variance adjusted chisquare statistic via the estimator option mlmv.
An adjusted version of the scaled chisquare statistic is presented in Bentler and Dudgeon (1996). Fouladi (1998) conducted an extensive simulation study that found that this adjusted chisquare test statistic outperformed both the standard ML chisquare and the original scaled chisquare test statistic, particularly in smaller samples. Unfortunately, the adjusted test statistic is not available in EQS 5.x.
The robust approaches work by adjusting, usually downward, the obtained model fit chisquare statistic based on the amount of nonnormality in the sample data. The larger the multivariate kurtosis of the input data, the stronger the applied adjustment to the chisquare test statistic. Standard errors for parameter estimates are adjusted upwards in much the same manner to reduce appropriately the type 1 error rate for individual parameter estimate tests. Although the parameter estimate values themselves are the same as those from a standard ML solution, the standard errors are adjusted (typically upward), with the end result being a more appropriate hypothesis test that the parameter estimate is zero in the population from which the sample was drawn.
Bootstrapping
The robust scaling approach described above adjusts the obtained chisquare model fit statistic based on the amount of multivariate kurtosis in the sample data. An alternative method to deal with nonnormal input data is to not adjust the obtained chisquare test statistic and instead adjust the critical value of the chisquare test. Under the assumption of JMVN and if the fitted model is not false, the expected value of the chisquare test of model fit is equal to the model's degrees of freedom (DF). For example, if you fit a model that was known to be true and the input data were JMVN and the model had 20 DF, you would expect the chisquare test of model fit to be 20, on average. On the other hand, nonnormality in the sample data can inflate the obtained chisquare to a value that exceeds DF, say 30. The robust scaled and adjusted chisquare tests mentioned in the previous section work by lowering the value of the obtained chisquare to correct for nonnormality. For instance, in this example a reasonable value for the robust scaled or adjusted chisquare might be 25 instead of 30. Ideally, the adjusted chisquare would be closer to 20, but the adjustments are not perfect.
Bootstrapping works by computing a new critical value of the chisquare test of overall model fit by computing a new critical chisquare value. In our example, instead of the JMVN expected chisquare value of 20, a critical value generated via the bootstrap might be 27. The original obtained chisquare statistic for the fitted model (e.g., 30) is then compared to the bootstrap critical value (e.g., 27) rather than the original model DF value (e.g., 20). A pvalue based upon the comparison of the obtained chisquare value to the bootstrapgenerated critical chisquare value is then computed.
How is the bootstrap critical chisquare value generated? First, the input data is assumed to be the total population of responses and the bootstrap program draws samples, with replacement, of size N from this pseudopopulation repeatedly. For each drawn sample, the input data are transformed to assume that your fitted model is true. This step is necessary because the critical chisquare value is computed from a central chisquare distribution; a central chisquare distribution assumes the null hypothesis is not false. The same assumption is made when you use the standard ML chisquare to test model fit: the obtained chisquare is equal to the model DF when the null hypothesis is not rejected.
Next, the model is fit to the data and the obtained chisquare is output and saved. This process is repeated across each of the bootstrap samples. At the conclusion of the bootstrap sampling, the bootstrap program collects the chisquare model fit statistics from each sample and computes their mean value. This mean value becomes the critical value for the chisquare test from the original analysis.
The procedure detailed above is credited to Bollen and Stine (1993) and is implemented in AMOS. AMOS allows the data analyst to specify the number of bootstrap samples drawn (typically 250 to 2000 bootstrap samples) and it outputs the distribution of the chisquare values from the bootstrap samples as well as the mean chisquare value and a BollenStine pvalue based upon a comparison of the original model's obtained chisquare with the mean chisquare from the bootstrap samples.
AMOS also computes individual parameter estimates, standard errors, confidence intervals, and pvalues for tests of significance of individual parameter estimates based upon various types of bootstrap methods such as biascorrection and percentilecorrection. Mooney and Duval (1993) and Davison and Hinkley (1997) describe these methods and their properties whereas Efron and Tibshirani (1993) provide an introduction to the bootstrap. Fouladi (1998) found in a simulation study that the BollenStine test of overall model fit performed well relative to other methods of testing model fit, particularly in small samples.
Cautions and notes
One of the corollary benefits of the bootstrap is the ability to obtain standard errors and therefore pvalues for quantities for which normal theory standard errors are not defined, such as rsquare statistics. A primary disadvantage of the bootstrap and the robust methods mentioned previously is that they require complete data (i.e., no missing data are allowed). Use of the bootstrap method requires the data analyst to set the scale of latent variables by fixing a latent variable's value to 1.00 rather than by fixing the corresponding factor's variance value to 1.00, because under the latter scenario, bootstrapped standard error estimates may be artificially inflated by switching positive and negative factor loadings across bootstrap samples (Hancock & Nevitt, 1999).
References
For more information about nonnormal data handling in SEM, see the following references:
Bentler, P. M., & Dudgeon, P. (1996). Covariance structure analysis: Statistical practice, theory, and directions. Annual Review of Psychology, 47, 563592.
Bollen, K. A. (1989). Structural equations with latent variables. New York, NY: John Wiley and Sons.
Bollen, K. A., & Stine, R. A. (1993). Bootstrapping goodnessoffit measures in structural equation models. In K. A. Bollen and J. S. Long (Eds.) Testing structural equation models. Newbury Park, CA: Sage Publications.
Browne, M. W. (1984). Asymptotically distributionfree methods for the analysis of covariance structures. British Journal of Mathematical and Statistical Psychology, 37, 6283.
Curran, P. J., West, S. G., & Finch, J. F. (1996). The robustness of test statistics to nonnormality and specification error in confirmatory factor analysis. Psychological Methods, 1, 1629.
Davison, A. C., & Hinkley, D. V. (1997). Bootstrapping methods and their application. Cambridge, UK: Cambridge University Press.
Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. New York, NY: Chapman and Hall Publishers.
Fouladi, R. T. (1998). Covariance structure analysis techniques under conditions of multivariate normality and nonnormality  Modified and bootstrap test statistics. Paper presented at the American Educational Research Association Annual Meeting, April 1117, 1998, San Diego, CA.
Hancock, G. R., & Nevitt, J. (1999). Bootstrapping and the identification of exogenous latent variables within structural equation models. Structural Equation Modeling, 6(4), 394399.
Mooney, C. Z., & Duval, R. D. (1993). Bootstrapping: A nonparametric approach to statistical inference. Newbury Park, CA: Sage Publications.
Muthén, B. O. (1993). Goodness of fit with categorical and other nonnormal variables. In K. A. Bollen and J. S. Long (Eds.) Testing structural equation models. Newbury Park, CA: Sage Publications.
Muthén, B. O., du Toit, S. H. C., & Spisic, D. (In press). Robust inference using weightedleast squares and quadratic estimating equations in latent variable modeling with categorical and continuous outcomes. Psychometrika.
Olsson, U. H., Troye, S. V., & Howell, R. D. (1999). Theoretic fit and empirical fit: The performance of maximum likelihood versus generalized least squares estimation in structural equation models. Multivariate Behavioral Research, 34(1), 3159.
Satorra, A., & Bentler, P. M. (1988). Scaling corrections for chisquare statistics in covariance structure analysis. 1988 Proceedings of the Business and Economics Statistics Section of the American Statistical Association, 308313.
Seaman, M. A., Levin, K. R., and Serlin, R. C. (1991). New developments in pairwise multiple comparisons: Some powerful and practicable procedures. Psychological Bulletin 110: 577586.
Connecting to published statistical and mathematical applications on the ITS Windows Terminal Server
Question:
Someone told me that I could gain access to a number of different statistical and mathematical packages on one of the ITS server computers. Is this true? If so, how much does it cost to use the server and how do I set up my computer to connect to the server?
Answer:
Click HERE for information on how to access statistical and mathematical applications on the ITS servers.
Connecting to the Unix Timesharing Server
Question:
How do I connect to the Unix Timesharing Server?
Answer:
You will first need to validate your account for UTS (to run software on uts.cc.utexas.edu). This can be done by going to the EID protected account maintenance page and clicking on an Add Service button. When your account has been set up, login in and at your shell prompt type the command
eval `/usr/local/etc/appuser`
making certain that the string is in backquotes. This sets up all needed UNIX environment variables for statistical and mathematical applications. Now you can launch the software by typing a command at the shell prompt. See the FAQs for information on using the UNIX server for specific types of software.