1. HLM Availability

2. HLM Output File Error Message

3. Level-1 Regression Equations

4. R-squared in a Hierarchical Model

5. Graphing Multilevel Models

HLM Availability

#### Question:

I would like to perform a hierarchical linear models (HLM) analysis, also known as a multilevel model analysis. I've heard there is special software to perform this analysis called HLM. How can I gain access to this software?

#### Answer:

You may access HLM in one of three ways:

- License a copy from Scientific Software International, the HLM vendor, for your own personal computer.
- Obtain an ITS computer account to access our licensed copy over the UT Austin campus network. See our FAQ on General FAQ : Connecting to published statistical and mathematical applications on the ITS Windows Terminal Server for more information about this option.
- Download the free student version of SuperMix 1 from the Scientific Software International Web site for your own personal computer. SuperMix 1 can analyze two- and three-level models.

If your models of interest and databases are small, the free student version may be sufficient to meet your needs. For larger models, you will need to purchase your own copy of HLM or access the ITS shared copy of the software through the campus network.

HLM Output File Error Message

#### Question:

I am trying to use the HLM software installed on the Natural Sciences Terminal Server system. The software allows me to set up my model and run it, but when I run the model I see no output and I get an error message that says, "tmpfile: permission denied. Unable to open temp file". What should I do to fix this problem?

#### Answer:

HLM is stored on the server in a directory that is read-only. By default HLM attempts to write the output file of results to that same directory, but it is unsuccessful in the attempt because the directory is read-only. To rectify the problem, you must instruct HLM to use a different directory for its output. To change the location of the output file directory, select Basic Specifications from the list of menu options at the top of the HLM program window. In the Output File Name slot, enter a valid directory and file name.

Next, from the File menu, choose Save As and save the HLM command file to the same (or another) valid, writeable directory. As you browse for a suitable directory, you will notice that your local drives will appear differently. Specifically, they will include a $ symbol and the word 'Client' (e.g. C$ on 'Client' (C:)) to indicate that they are local drives rather than associated with the server (remote). Once you select a writeable directory, click OK and re-run the analysis.

Valid writeable directory names might be local directories (e.g., C:\TEMP) or a directory on your Windows NT server disk storage area (e.g., U:\). The advantage of using the U:\ directory or a valid subdirectory on the Windows NT server disk (e.g., U:\HLM_OUT) is that the server drive is always mounted whenever you connect to the Windows Terminal Server and it is always connected as U:, so no matter where you are, you can always access your data and output files that are stored on the server. Data transmission between the terminal server that houses HLM and disk server will also be faster than data transmission between your local computer's disk drives and the terminal server because the terminal server and the disk server are located near each other in the ITS computer machine room. On the other hand, storage of files on the disk server incurs a small charge to the end user.

Level-1 Regression Equations

#### Question:

I have a two-level model in which students are nested within schools. I would like to see the empirical Bayes level-1 regression equations for each school. Does HLM provide this output?

#### Answer:

HLM output does not include the empirical Bayes parameters for the individual level-2 units. However, it is possible to create a residual file that contains the difference between the average level-1 parameters and each school's parameters. The residual file that is created is not actually a data file, rather it is a syntax file for one of three possible software packages (SPSS, SAS, or SYSTAT). Using one of these programs, the syntax file can be run to put the data into a data file which can then be analyzed or used to compute new variables. The file contains both the ordinary least squares (OLS) residuals and empirical Bayes residuals.

To obtain this file, go to Basic Specifications on the menu to open the *Basic Model Specifications* dialog box. Click **Create Residual File**, which will open the *Create Residual File* dialog box. Click the radio button next to the type of file that you would like to create. In the current example, an SPSS file is created. You can also assign a file name other than the default name, *resfil*. The file will be saved to the same directory as your output file, which is indicated in the *Basic Model Specifications* dialog box.

The present example uses the *hsb.ssm* dataset that can be found in the HLM *Examples* directory. The level-1 model contains a random slope for the independent variable, ses, and a random intercept. The model is shown below:

Running the above model with the option to create a residual file produces an SPSS syntax file that can be opened in the SPSS Syntax Editor. The first several lines are shown below:

DATA LIST FIXED RECORDS = 6

/1 ID NJ CHIPCT MDIST LNTOTVAR OLSRSVAR MDRSVAR (A12,F5,5F11.5)

/2 EBINTRCP EBSES (2F11.5)

/3 OLINTRCP OLSES (2F11.5)

/4 FVINTRCP,FVSES ,(2F11.5)

/5 PV00 PV10 PV11 (3F11.5)

/6 SIZE SECTOR PRACAD DISCLIM HIMINTY MEANSES ( 6F11.5).

BEGIN DATA

1224 47 0.59435 0.63022 2.02739 2.01643 2.00550

-1.60474 0.11084

-1.85980 0.11470

12.66493 2.39388

0.735079 0.104732 0.334263

842.00000 0.00000 0.35000 1.59700 0.00000 -0.42800

1288 25 0.20625 0.24837 1.94945 1.92016 1.90261

0.40859 0.08532

0.45000 0.86157

12.66493 2.39388

1.158870 -0.067171 0.371764

1855.00000 0.00000 0.27000 0.17400 0.00000 0.12800

The syntax that is created can be used to create an SPSS dataset. To do this, first open the file in the SPSS Syntax Editor, then submit the syntax using the following menu items:

**Run**
**All**

This will read the data contained in the syntax file into the SPSS Data Editor. Once the data is in the Data Editor, the level-1 regression equation can be computed. The data produced by the syntax shown will appear as follows in the SPSS Data Editor:

Each row represents a level-2 unit in the residual file, so in this file, each row represents a school. The variables that are used for computing the level-1 slopes and intercepts for individual schools are *ebintrcp, ebses, olintrcp, olses, fvintrcp, and fvses*. The ordinary least squares residuals are prefixed with *ol* and the empirical Bayes parameters are prefixed with *eb*. For example, the OLS intercept has the name, *olintrcp*, and the slope for the variable *ses* has the name *olses*. The OLS regression parameters are represented as differences from the average parameters. For example, the Empirical Bayes intercept, *ebintrcp*, for school ID 1224 is -1.6 units smaller than the average intercept.

The average intercept and slope are prefixed with* fv*, where the slope is *fvintrcp*, and the ses intercept is *fvses*. These values are the values in the F*inal estimation of fixed effects* table in the HLM output. The OLS and empirical Bayes intercepts and slopes for level-1 units are computed in the same manner: the average intercept and slopes (*fvintrcp* and *fvses*) are added to the residuals to obtain the level-1 parameter values. Each parameter is computed separately using a **COMPUTE** statement in SPSS. The **COMPUTE** statement contains the name of the new variable on the left side of the equation and the numerical formula on the right side of the formula. For example, to create a new variable named eb_int, which is a level-1 empirical Bayes intercept, you would use the following syntax:

COMPUTE eb_int = fvintrcp + ebintrcp .

EXECUTE .

In this example, eb_int is the new variable which takes the value of the average intercept, fvintrcp, and the empirical Bayes residual, ebintrcp, from the grand intercept. The code above can be executed in the Syntax Editor using the same process that was described earlier for reading in the syntax. After executing the syntax, the new variable will appear in the rightmost column of the Data Editor.

R-squared in a Hierarchical Model

#### Question:

I have a two level model in which students are level-1 units nested within schools, which are my level-2 units. My model has random level-1 intercept. Is it possible to obtain an R-squared value for my hierarchical model?

#### Answer:

It isn't possible to obtain a true R-squared value in HLM; however, there are statistics that provide a value of the total explainable variance that can be explained by the model, and they are often referred to as R-squared or pseudo R-squared values. HLM does not display these R-squared values in its standard output. However, you can compare the error terms in an unrestricted model and a restricted model to obtain the proportion of variance explained by your model. An unrestricted model or null model is one that contains a dependent variable and level-1 random intercept. Thus, an unrestricted model does not contain any independent variables. One formula, suggested by Kreft and de Leeuw (1998) and Singer (1998), that can be used for obtaining within- and between-unit variance explained is the following:

(unrestricted error – restricted error) / unrestricted error

The within-unit variance explained is a measure of how well the independent variables in the model explain the outcome variable. The between-unit measure is the amount of variance between level-2 units that is accounted for by the predictors in the model.

Some alternatives to the above formula are described by Snijders and Bosker (1999). They suggest the following formula for computing within-unit variance explained:

1 - ((level-1 restricted error + level-2 restricted error) / (level-1 unrestricted error + level-2 unrestricted error))

And the following for computing between-unit error variance. In this formula, n is the number of individuals in each level-2 unit. As it is rarely the case that there are equal numbers of individuals in every level-2 unit, Snijders and Bosker (1999) suggest either using a reasonable number or the harmonic mean for n in the following formula:

((level-1 restricted error / n) + level-2 restricted error) / ((level-1 unrestricted error / n) + level-2 unrestricted error)

These formulas can be illustrated using the hsb.ssm file that is available in the HLM *Examples* directory. If you are using HLM on the Stat Apps terminal server, you can find the *Examples* directory in the following path: *N:\Program Files\Hlm404\Examples*. The first step to obtain the R-squared value is to run the unrestricted model. As previously stated, the model contains only a random intercept and no independent variables. In the dialog box below, the model has an outcome variable, *mathach*, but has no independent variables:

The error terms for both the level-1 and level-2 models that you will use to obtain an R-squared are in the Final estimation of variance components section at the bottom of the output:

Final estimation of variance components:

-----------------------------------------------------------------------------

Random Effect Standard Variance df Chi-square P-value

Deviation Component

-----------------------------------------------------------------------------

INTRCPT1, U0 2.93501 8.61431 159 1660.23264 0.000

level-1, R 6.25686 39.14831

-----------------------------------------------------------------------------

You can see that the level-1 error term is 39.15 in this model and the level-2 error term is 8.61.

The next step to obtaining the values of interest is to replicate these statistics in a restricted model. This can be illustrated by adding an independent variable to the above model. In the dialog box below, the level-1 independent variable, ses, which is each student's socioeconomic status, is added to the level-1 model:

Note that the level-2 intercept is fixed in the above model (there is no error term in level-2). Again, you will look at the *Final estimation of variance components section* at the bottom of the output to obtain the error terms:

Final estimation of variance components:

-----------------------------------------------------------------------------

Random Effect Standard Variance df Chi-square P-value

Deviation Component

-----------------------------------------------------------------------------

INTRCPT1, U0 2.18361 4.76815 159 1037.09077 0.000

level-1, R 6.08559 37.03440

-----------------------------------------------------------------------------

In this model, the level-1 error term is 37.03 and the level-2 error term is 4.77. These values can now be used to calculate the within- and between-unit variance explained. First, consider the within-unit formula, which is a measure of how well socioeconomic status explains math achievement scores:

(39.15 - 37.03) / 39.15 = .05

Thus, the model containing socioeconomic status explains 5% of the explainable variance using this formula. Using the formula recommended by Snijders and Bosker (1999), the following values are used to calculate the explained variance:

1 - ((37.03 + 4.77) / (39.15 + 8.61)) = .12

Next, it is often useful to examine the amount of between-unit variance explained. Here, using the formula provided by Kreft and de Leeuw (1998) and Singer (1998) with the level-2 variances, the following values are obtained:

(8.61 - 4.77) / 8.61 = .45

Or, using the Snijders and Bosker (1999) method with the harmonic mean of 41.03, the following values are obtained:

((37.03/41.03) + 4.77) / ((39.15/41.03) + 8.61)) = .59

Socioeconomic status explains 45% of the explainable between-unit variance in this model using the first formula and 59% using the second formula. Thus, it appears that socioeconomic status contributes greatly to explaining variation between schools, but does not explain much variance in math achievement scores.

It should be noted that there are some potential problems with the method described above. One possible problem is the possibility that the level-1 variance is larger in the restricted model than the unrestricted model, which would produce negative R-squared values. Kreft and De Leeuw (1998) point out that the formula may not apply to situations where there are random intercepts. This is especially true for computing the between-unit variance explained, as there is not a single level-2 error term in models containing random slopes.

**References**

Snijders, T., & Bosker, R. (1999). *Multilevel Analysis: an introduction to basic and advanced multilevel modeling.* London: Sage Publications.

Kreft, I., De Leeuw, J. (1998). *Introducing Multilevel Modeling*. London: Sage Publications.

Singer, J.(1998). Using SAS PROC MIXED to fit multilevel models, hierarchical models, and individual growth models. *Journal of Education and Behavioral Statistics, 24 (4)*, 323-355.

Graphing Multilevel Models

#### Question:

Is it possible to graph my multilevel model using HLM?

#### Answer:

##### This FAQ assumes that you know how to run and interpret a two level model using HLM. If you do not, see our online HLM tutorial. This document uses the sample dataset HSB.SSM that is available with the HLM software.

HLM added a graphing facility in version 5.04 of the software. You can graph models with random slopes and intercepts using this graphing facility.

Before graphing an equation, you should first set up and run an analysis on your model, including all of the variables that you would like to graph.

The first step to graphing a multilevel model is to specify a location on disk for your graph file. To do this, select the Basic Specifications menu item. In the ensuing dialog box, click **Graph Equations.** If you are using HLM on the Windows terminal server, *Earthquake.cc.utexas.edu*, the graph file must be saved to your user drive (the U drive).

Next, go to the *File* menu and select the *Graph Equation* menu item.

Your Y axis will always be the dependent variable specified in the level-1 equation. You can choose any variable that you like for the X axis. For example, if you were studying the relationship between SES and math achievement scores, you would likely choose SES as your X axis variable. Choosing a continuous variable for the X axis will produce a line graph that is the regression line, whereas choosing a categorical variable for the X axis will produce a bar graph. If you choose only an X variable, your graph will be the regression line for the entire population. For example, using the HLM sample data set *HSB.SSM*, graphing the regression line for the math achievement regressed on SES produces the following graph:

The graphing facility can also be used for more sophisticated graphs as well that contain level-2 variables and random effects. Using the options for the Z axis, you can add additional variables to the model containing random slopes and intercepts. For example, the model shown below has a random intercept and random slope for the SES variable.

After running this model, you can request separate lines for the variable sector. This variable has two values, 0 and 1. Selecting this variable in the Z focus(1) section of the dialog box, under the *Level-2* dropdown menu produces the following graph:

The two lines represent the two values of *sector*. As the legend indicates, the line with the lower intercept is for cases where sector has a value of 0 and the higher intercept is for cases that have values of 1 for the sector variable.

In addition to graphing separate lines for dummy coded, or categorical data, you can also graph separate lines for continuous independent variables. You would likely not want to graph separate lines for every value of a continuous variable, so you will have to make some decisions about how to represent these variables. Continuing with the above example, the level-2 variable, school’s mean SES, can be added using the *Z focus(2)* section of the dialog box. Selecting meanses from the Level-2 dropdown menu will add this variable to the graph. There are several options available for representing continuous variables, such as *meanses*. Once a continuous variable has been chosen, the Range of *z axis* dropdown menu will be populated with the following options: *25th and 75th percentiles, 25th/50th/75th percentiles*, *Averaged upper/lower quartiles*, and *Choose up to 6 values*. If you select the *Choose up to 6 values* option, you will need to fill in the boxes under the Choose up to 6 heading. Otherwise, separate lines will be plotted for each of the percentiles or quartiles that were selected. For example, choosing the *25th and 75th percentiles* option, in conjunction with the options selected in the previous examples, produces the following graph:

In this graph the lower 25th percentile of meanses is -.296 and the upper 75th percentile is .332. Thus, there is a line plotted for cases with a value of 0 for sector and a mean SES of -.296 as well as a line for a mean SES of .332. The same is true for cases with a value of 1 for sector: there is a line for a mean SES of -.296 as well as a line for a mean SES of .332.