Installation, Importing & Exporting, and Common SAS Errors

Data Management and Transforming Variables

Statistical Tests and Data Analysis

Graph and Output Options

Programming and Macros

 


Reading Adjacent Year, Month, and Day Values Using SAS

Question:

How do I read dates into a SAS data set from a data file having two columns for years, two columns for the month, and two columns for the date--all adjacent to one another?

Answer:

Your data arrangement is already consistent with that needed to use one of the SAS date informats. Thus in the DATA step's INPUT statement, you need only associate the six date columns with the SAS date informat of the form:

INPUT birthday yymmdd6. ;

For more information, click on the Help button in the SAS menu bar and scroll to SAS Help and Documentation.

Back to Top

 


 

File Relationships in SAS

Question:

What are the different files used and produced by SAS and what are they all for?

Answer:

When running SAS in noninteractive mode, (that is when you issue the SAS command followed by an input file name), there are five different files that you are likely to be using.

These are:
1. text data files
2. command files
3. SAS data set files
4. SAS listing files
5. SAS log files

A text data file is a document that contains your data in text form. You entered the data from some text editor into this file. Note that it is possible to enter data in a command file, so separate data files are not necessary. However, since data can be read in many different ways, it is usually more efficient to create data files separate from command files.

A command file is a document that contains the SAS commands that will read the data in the text data file, plus commands that will produce some sort of output from the data. You must create the command file using a text editor program, just as with your text data file. As mentioned, the command file can also contain the data.

A SAS data set file is created when a command file contains a DATA step. SAS data set files can be created so that they exist only as long as SAS is running your current noninteractive submission, or they can be created to be saved and used again during a different submission. Note that SAS data set files cannot be edited by text editor programs.

The only way to see the data in a SAS data set file is to use the file in SAS, (or some other program that can read SAS data set files).

SAS log files are produced as SAS runs the command file. When the command file is read, SAS checks each statement and tries to execute it. If it can, the statement is sent to the SAS log file with either no comment attached, or a brief note describing the execution process. If the statement can be executed, but problems occurred during execution, the statement is sent to the SAS log file with a warning statement describing the problem. If the statement cannot be executed, SAS sends the statement to the log file along with an error message describing why the command could not be used. Examination of log files following submission of command files is the best way to find mistakes in a comand file.

Listing files are produced when SAS executes a command from the command file that creates output. Any command that creates output sends that output to the listing file. The listing file is the file with the information you were trying to produce with the command file.

There are other files used and produced by SAS. For more information, see SAS Language: Reference, Version 6, First Edition. In particular, see Chapter 2: The DATA step, and Chapter 5: SAS Output.

Back to Top

 


 

Uncompressing .Z Files Within SAS

Question:

I have a large compressed data set (.Z file) that I want to input to a SAS program without permanently uncompressing the data set to disk. I am using a UNIX system.

Answer:

Using SAS on a UNIX system permits you to use a "pipe" to route the output of a UNIX command to a SAS DATA step or procedure. For instance:
FILENAME indata PIPE 'zcat data.Z' LRECL=192;
DATA temp1;
INFILE indata;
INPUT specification of data input;
Here zcat is the UNIX uncompression command being used to uncompress the raw data file named data.z (this file is assumed here to reside in your current working directory). Note that the "zcat" is specified in lowercase because it is a UNIX system command (UNIX is a case sensitive operating system). Note that the LRECL=192 command is needed to define the file length of the compressed data file.
If you are using a gzipped file (.gz format), you can use the same SAS syntax to uncompress the data file, except that you supply a different UNIX decompression command to SAS:
FILENAME indata PIPE 'gunzip -c data.gz' LRECL=192; 
DATA temp1;
INFILE indata;
INPUT specification of data input;
Consult the Helpdesk at (512) 475-9400 for more information on the compression programs available. See the online SAS 9.2 Companion for the UNIX Environment for more details.

Back to Top

 


 

Reading Hierarchical Raw Data Files


Question:

How do I read data into a SAS dataset when the data in one variable indicates which variables should be read in next (from the same line)?

Answer:

From the same line depending on the decisive variable's value, use a trailing at-sign @ in the SAS INPUT statement. For example:
INPUT a 7 @;
IF a = 1 THEN INPUT b 9-13 c 15;
IF a = 2 THEN INPUT d 16 e 17-18;
Here the variable A is read from column 7. If its value is 1, then variables B and C are read. If A's value is 2, then variables D and E are read. If A's value is anything other than 1 or 2, no other variables are read.
The critical component is the trailing single @ in the initial INPUT statement. This holds the pointer (the device that is reading the data file) at its present location (the first blank after the variable A), so that data can continue to be read in from the current line when another INPUT statement is processed. Removing the @ would cause the pointer to read from the next line of data each time the INPUT command is encountered.

Back to top

 


 

Reading in A Subset of Cases in SAS

Question 1:

I have a large external data file. How can I test my SAS program by first reading only a few cases (before running it on the entire data set)?

Answer 1:

You can use the OBS = option in the INFILE statement to read only a subset of a raw data set. For example, if your data set had 500 records in it and you wanted to read only the first 50 records, the INFILE statement would take the form:
INFILE 'filespecs' OBS = 50 ;
Here FILESPECS are the external data file specifications (usually only the file name is needed).

Question 2:

How can I do something similar with data I have in either a SAS transport file or a permanent SAS dataset?

Answer 2:

There are several ways you can accomplish this goal:
1. Method One


First, read the data into SAS using whatever SAS code you need to make the dataset available as a SAS dataset. Then start a new SAS dataset and use the N+1 statement and an IF statement to modify the new dataset so that it only retains the number of cases specified by the IF statement. Here's an example:

DATA new ;
SET old ;
N+1 ;
IF N LE 100 ;
RUN ;
This code reads the information from the SAS dataset "old" into a new SAS dataset called "new". The N+1 statement creates a counter variable called N. Then, IF N is less than or equal to (LE) 100, the case is retained in the new SAS dataset.


2. Method Two

You can use an option in the SAS SET statement to limit the number of observations read by the new dataset.
DATA new ;
SET old (OBS = 100);
RUN ;
Either method should produce the same result; which method you use is entirely up to you and the requirements of your analysis. For example, Method One allows you access to the whole dataset prior to the execution of the IF statement. This can be an asset if you need access to the entire dataset, or a liability if your dataset is large and you only want to analyze the subset portion of the dataset.

Back to Top

 


 

Reading variable-length character strings into SAS

Question:

I'm working with a file that has a variable-length record format. Specifically, the last field contains character strings of differing lengths. Worse yet, this field will have embedded blanks. It's a character variable that can contain expressions like, "Hi there" and "A bientot". How can I get SAS to read this file?

Answer:

One way to get SAS to read the last field as a variable-length field is to assign the variable the maximum necessary length by using the LENGTH statement (before the INPUT statement). Use an ampersand sign (&) to tell SAS to expect one or more single embedded blanks in a character-valued variable. For example:
DATA temp;
INFILE 'raw dat a';
LENGTH name $ 50 ;
INPUT v1 $ 7-21 @23 name & $ ;
RUN ;

Back to Top

 


 

SAS System Options in the UNIX Environment

Question:

How do I specify SAS system options in the UNIX environment?

Answer:

How you specify SAS system options depends on how you use SAS in the UNIX environment.
If you use SAS via an X-terminal or X-terminal emulation software such as Exodus or MacX, the command to launch SAS on ITS UNIX systems is
/usr/local/sas/sas
SAS system options are preceded by a hyphen and immediately follow the SAS command. For example, if you want to have SAS write its work files (including temporary datasets) to a directory called "mysasdir" located one level below your own current working directory, the syntax for invoking this option would be:
/usr/local/sas/sas -work ./mysasdir
If you use the SAS display manager system via a vt100 terminal interface such as telnet, the usual command to launch SAS is:
/usr/local/sas/sas -fsd ascii.vt100
The -fsd portion of this command is a SAS option which means "full screen device". You could add another SAS system option to this command, such as the -work option mentioned above:
/usr/local/sas/sas -fsd ascii.vt100 -work ./mysasdir
If you run SAS noninteractively by supplying the SAS program file "mysasprog" in the current directory, you would change your command from:
/usr/local/sas/sas mysasprog
to:
/usr/local/sas/sas -work ./mysasdir mysasprog
This would direct SAS to write any work files to the "mysasdir" directory as it processes the contents of your SAS program "mysasprog".
Each of these examples assumes you want to use a particular SAS system option once or a few times. If you intend to use a SAS system option repeatedly, it can be a nuisance to specify the same option each time you invoke SAS. For this type of situation, you can copy the config.sas612 file located in the /usr/local/sas/sas612 directory and edit it using a UNIX text editor. The config.sas file is a SAS configuration file that contains default settings for a number of SAS system options; you may change the default settings of these options as well as delete or add options of your choosing to this file. If you then launch SAS from the same directory as the config.sas612 file, SAS will use those options.

Back to Top

 


 

Reading compressed SAS transport files

Question:

I have a compressed SAS transport file (created using the XPORT engine). It has a UNIX file form of ".Z". How can I read it using SAS?

Answer:

This can be done using a combination of a SAS FILENAME statement with the PIPE option, and the SAS LIBNAME statement with the XPORT option. For example, suppose your compressed SAS transport file is called "transport.data.Z". A SAS program to read it would look like this:
FILENAME trans PIPE 'zcat transport.data.Z'; 
LIBNAME trans XPORT;
PROC COPY IN=trans OUT=SASUSER;
RUN;
This will copy all the datasets in the transport file to the default SAS permanent dataset library SASUSER. Other options are possible: once the LIBREF is defined, you can use it as you would any other. However, keep in mind that the data from the compressed file must be read sequentially, so if the transport file is large, this will take some time.

Back to Top

 


 

Calling SAS macros on UNIX systems

Question:

On UNIX how do I call an external SAS MACRO that is in another file and not physically included in my SAS program?

Answer:

Put the SAS MACRO in a file having a name "macro_name.sas", where "macro_name" is the macro name from the SAS %MACRO statement.
When you want to use this macro in a SAS program, perform the following steps:
1) Use a FILENAME statement of the form:
FILENAME wheremac 'dir';
where "wheremac" is a fileref pointing to the directory where the macro file is stored, and "dir" is the directory path to where the macro file is stored.
2) Use an OPTIONS statement of the form:
OPTIONS SASAUTOS=wheremac;
to point to the directory containing the macro file.
3) Use a call for your macro as you normally would with the "%macroname" specification, where "macroname" is the first name of the macro file (and the name of the macro in the %MACRO statement).
For example if you had a macro to do a PROC PRINT stored in a file called "prt.sas", it could look like this:
%MACRO prt;
PROC PRINT;
RUN;
%MEND;
If you stored this file in your $HOME directory, you could run the SAS job:
FILENAME wheremac '$HOME/';
DATA one;
INPUT a b ;
CARDS;
1 2
3 2
;
OPTIONS SASAUTOS=wheremac;
%prt
The above code will create the data set and print the data to the SAS listing file.

Back to Top

 


 

FATAL: Unable to initialize work library on SAS for UNIX

Question:

I'm getting an error message when I use SAS on a UNIX system. It says, "FATAL: Unable to initialize work library". What's going on?

Answer:

On the ITS' UNIX system (UTS), SAS writes any work files such as temporary datasets to the shared directory /var/tmp. This directory is used for similiar purposes by other users; when it is full, no further work can be done until some of the processes are completed.
To verify that the /var/tmp directory is full, run the following command from the UNIX prompt:
df -k /var/tmp
The UNIX system will return a display similar to the following
Filesystem 512-blocks Free %Used Iused %Iused Mounted on
/dev/lv07 966656 882920 9% 52 1% /var/tmp
The crucial column to examine is labeled %Used. If it reads 100%, the /var/tmp directory is full. You should call ITS as soon as possible to notify ITS operations staff that /var/tmp is full. During normal working hours, call 512-475-9400; during weekends and evenings, call 512-475-9300. In addition, you may want to run the UNIX command ls -l /var/tmp to see if your own use of SAS created any directories or files in /var/tmp. You can then remove these files using the UNIX rm and rmdir commands.
ITS operations staff can generally fix problems associated with /var/tmp rapidly. If you need to use SAS before the /var/tmp problems are resolved, you can direct SAS to write its temporary datasets into a directory you own, rather than /var/tmp. One way to do this is to use the SAS system option -work when you start SAS.
You may need to increase your disk space quota in order to write your work files to your own directory. To do this on an ITS Unix system, use the chquota command. You may increase your quota to a maximum limit of 500 megabytes using the chquota command. If you need more than 500 megabytes of storage space, send email to help@its.utexas.edu for assistance.
You will be charged a nominal fee for each megabyte of storage that you add to your quota. If such charges are a concern, you can lower your quota again immediately after your job is completed.
To obtain more information about the UNIX commands mentioned above, type man commandname at the UNIX system prompt, where commandname is the name of the relevant UNIX command.

Back to Top

 


 

Updating the SAS System license

Question:

How do I update a SAS installation with license information provided to me by Software Distribution and Sales?

Answer:

The method you use to update your SAS license depends upon the version of SAS installed on your computer as follows:
Method One: Updating a SAS v9 License
Method Two: Updating a SAS v8.2 License
Method Three: Using the SAS System License Update Utility to Update a SAS v8.1 or lower License
Method Four: Command Line Method for Updating a SAS v8.1 or Lower License
Click on the appropriate link above to proceed to the instructions for updating your SAS license (these instructions pertain only to MS Windows operating systems; if you are using a different operating system such as Mac OS or UNIX, contact stats@ssc.utexas.edu for assistance).
Method One: Updating a SAS 9 Installation
UT affiliates with a valid license for SAS v9 need to obtain a text file containing the license information. This file is available from ITS Software Distribution ( software@its.utexas.edu) and is mailed to all license holders during the Fall semester. Once you have the file, save it to a convenient location on your hard disk. For illustration, assume that you have saved it in the SAS 9 installation folder with the name v9LicenseInformation.txt, and that you have installed the application in the folder
C:\Program Files\SAS\ .
From the Start button, click on ‘All Programs’, click on ‘SAS’ (or whatever alternative folder name you have chosen for the SAS 9 installation), and then click on ‘Renew SAS Software’.
This should produce a pop up window in which you can supply the name of the install data file along with its path, e.g., ‘C:\Program Files\SAS\V9LicenseInformation.txt’ or select it by navigation using the ‘Browse’ button.

Click on the ‘Next’ button and you should get a new pop up window which lists the available setinits. There should only be one entitled ‘UNIVERSITY OF TEXAS SYSTEM- SYSTEMWIDE-T/R’ which should be highlighted by default.

Click on the ‘Next’ button and you should get another pop up window prompting for the locations of the SAS 9 installation and the SAS config file. The SAS license renew program will fill in this information with typically the correct information. If these locations are different from those that appear, you can also be typed in or selected by navigation using the ‘Browse’ button. For almost all users, accepting the default choices is the best option.

Click on the ‘Renew’ button, and you should get a final pop up window confirming that the setinit was successfully applied.

To determine if your authorization update was successful, start SAS V9. Then enter and execute the following SAS syntax:
Proc setinit; 
Run;
Method Two: Updating a SAS 8.2 License
Please follow the steps outlined below to update your SAS v8.2 license:
Step 1:
To update your SAS v8.2 license, you will need the current license information that was sent to you as an attachment via E-mail. Save the attached setinit.sss file to your local hard drive. For example, save the file to "C:\Program Files\SAS Institute" which was created when you installed SAS (assuming you used the default partition).

Step 2:
Open Windows Explorer which can be done in several ways. One approach is to click on the Start button, click on the Run icon, and then enter 'explorer' into the text box and click 'OK' to start the program.

Once the Windows Explorer is open, then navigate to the "C:\Program Files\SAS Institute" folder so that you can see the setinit.sss file in the Windows Explorer right-hand window.
Step 3:
Right-mouse click on the file in the Windows Explorer right-hand window. Select the second option from the top entitled "Apply Authorization Code to SAS V8" and then press on the left-mouse button which will start the authorization process. Depending on the speed of your computer, you may see a brief window message appear indicating that SAS is running.

*Optional Step 4:
To determine if your authorization update was successful, start SAS v8.2. Then enter and execute the following SAS syntax:
Proc setinit;
Run;
If you were successful, then you should receive the following information in your Log window (in this example, a SAS license is being updated for the 2003 to 2004 academic year):
proc setinit;
run;
NOTE: PROCEDURE SETINIT used:
real time 0.04 seconds
cpu time 0.01 seconds
Original site validation data
Site name: 'UNIVERSITY OF TEXAS SYSTEM-SYSTEMWIDE-T/R'.
Site number: 40204001.
Expiration: 31AUG2004.
Grace Period: 45 days (ending 15OCT2004).
Warning Period: 45 days (ending 29NOV2004).
System birthday: 23NOV1992.
Operating System: WIN .
Product expiration dates:
---Base Product 31AUG2004
---SAS/STAT 31AUG2004
---SAS/GRAPH 31AUG2004
---SAS/ETS 31AUG2004
---SAS/FSP 31AUG2004
---SAS/OR 31AUG2004
---SAS/AF 31AUG2004
---SAS/IML 31AUG2004
---SAS/QC 31AUG2004
---SAS/SHARE 31AUG2004
---SAS/LAB 31AUG2004
---SAS/ASSIST 31AUG2004
---SAS/CONNECT 31AUG2004
---SAS/INSIGHT 31AUG2004
---SAS/CPE 31AUG2004
---SAS/EIS 31AUG2004
---SAS/GIS 31AUG2004
---SAS/SPECTRAVIEW 31AUG2004
---SAS/SHARE*NET 31AUG2004
---SAS/WAREHOUSE 31AUG2004
---SAS/MDDB Server 31AUG2004
---SAS/Enterprise Miner 31AUG2004
---SAS/IT Service Vision Client 31AUG2004
---SAS/Enterprise Reporter 31AUG2004
---SAS/IntrNet Compute Services 31AUG2004
---SAS/MDDB Server common products 31AUG2004
---SAS/OnlineTutor: SAS Programming 31AUG2004
---SAS/Integration Technologies 31AUG2004
---SAS/SECURE-WIN 31AUG2004
---SAS/AppDev Studio 31AUG2004
---PRODNUM117 30JUN2004
---PRODNUM123 14APR2004
---SAS/ACC-DB2 31AUG2004
---SAS/ACC-ORACLE 31AUG2004
---SAS/ACC-SYBASE-SQL Server 31AUG2004
---SAS/ACC-PC File Formats 31AUG2004
---SAS/ACC-ODBC 31AUG2004
---SAS/ACC-OLE DB 31AUG2004
---SAS/ACC-R/3 31AUG2004
If you are unable to start SAS, you can still view the results of the license update process by opening the setinit.log file which should be in the same directory as the setinit.sss file as shown below:

If you have any questions, please contact the ITS Helpdesk at 512-475-9400 or at help@its.utexas.edu. If they are unable to guide you through this process, then they will contact the statistical consulting group at stats@ssc.utexas.edu for further assistance.
In case you did not receive the setinit.sss file or it was corrupted for some reason, the information in that file is given below except for the actual password information (in this example, a SAS license is being updated for the 2003 to 2004 academic year). To create this file, the user needs to open a new text file using a text editor such as notepad and copy and paste the information below into the file. Then save the file as setinit.sss file to your local hard drive. For example, save the file to "C:\Program Files\SAS Institute" which was created when you installed SAS (assuming you used the default partition). Then proceed to Step 2 outlined above.
***************SAS Setinit Information for SAS v8.2******************;
***************Save only Information Below***************************;
PROC SETINIT RELEASE='8.2';
SITEINFO NAME='UNIVERSITY OF TEXAS SYSTEM-SYSTEMWIDE-T/R'
SITE=40204001 OSNAME='WIN' RECREATE WARN=45 GRACE=45
BIRTHDAY='23NOV1992'D EXPIRE='31AUG2004'D PASSWORD=xxxxxxxxx;
CPU MODEL=' ' MODNUM=' ' SERIAL=' ';
EXPIRE 'BASE' 'STAT' 'GRAPH' 'ETS' 'FSP' 'OR' 'AF' 'IML' 'QC' 'SHARE'
'LAB' 'ASSIST' 'CONNECT' 'INSIGHT' 'CPE' 'EIS' 'GIS'
'SPECTRAVIEW' 'SHARE*NET' 'WAREHOUSE' 'MDDB Server'
'Enterprise Miner' 'IT Service Vision Client'
'Enterprise Reporter' 'IntrNet Compute Services'
'MDDB Server common products' 'OnlineTutor: SAS Programming'
'Integration Technologies' 'SECURE-WIN' 'AppDev Studio' 'DB2'
'ORACLE' 'SYBASE-SQL Server' 'PC File Formats' 'ODBC' 'OLE DB'
'R/3' '31AUG2004'D / CPU=CPU000;
EXPIRE 'PRODNUM117' '30JUN2004'D / CPU=CPU000;
EXPIRE 'PRODNUM123' '14APR2004'D / CPU=CPU000;
SAVE; RUN;
*DROPPED SAS/CALC;
*XYZ 57221;
*PRODNUM117 = SAS/Genetics;
*PRODNUM123 = SAS Bridge for ESRI;
*IT Service Vision client technology authorized for 9999 Users;
*Warehouse technology authorized for 9999 admin users;
*AppDev Studio technology authorized for 9999 users;
*Enterprise Miner client technology authorized for 9999 Users; 
*40204001 8.2;
Method Three: Using the SAS System License Update Utility to Update a SAS v8.1 or Lower License
You can update your license by using the SAS System license update utility. To launch the utility, select START from the Windows taskbar, then choose Program Files, and then choose The SAS System. Now choose the option Update SAS License Information. Windows will launch the InstallShield Wizard which will guide you through the SAS license update process until you are asked to update the setinit.sas file.
Click Next when the InstallShield Wizard asks you questions about the config.sas file and the SASROOT folder. At the end of the list of questions, the installer will ask you one final question: "Has your updated setinit been provided to you on paper?"
License information on diskette
If you are using a diskette to perform the license update (the standard method of updating the license), answer No to this question. You will then be prompted for the location of the new setinit.sas file. Choose the diskette's directory (typically this is a:\). Click Next. SAS will now launch and you should see a message appear that reads, "License information for the SAS System has been updated. Click OK to exit." Click OK. The license update process is now complete and SAS should function normally.
You can verify that SAS updated the license correctly by selecting Find Files or Folders from the Find option under the Windows Start menu. Select My Computer under Look in: and type setinit.log in the Named: window. Then click OK. The find file utility will list all files called "setinit.log" on your computer. Edit the most recent version of setinit.log. It should contain no errors or warnings, and all setinit information should be updated to function through the end of the academic year. If the setinit.log file contains errors, you must re-apply the new license file following the steps described above.
License information provided on hard copy
If you are not using a diskette to perform the license update and are using a hard (paper) copy of the setinit.sas file to perform the license update, answer Yes to the question, "Has your updated setinit been provided to you on paper?". The installer will now ask you, "Would you like to invoke the editor to correct your SETINIT information now?" Answer Yes to this question.
You will see the setinit.sas file displayed in an open window. It will look like this:
PROC SETINIT RELEASE='6.12';
SITEINFO NAME='UNIVERSITY OF TEXAS AT AUSTIN'
SITE=11206001 OSNAME='WIN_NTSV' RECREATE
BIRTHDAY='23NOV1992'D EXPIRE='31AUG1997'D PASSWORD=xxxxxxxxx;
ALIAS 'ACADEMIC' 'BASE' 'GRAPH' 'ETS' 'FSP' 'AF' 'OR' 'IML' 'SHARE'
'QC' 'STAT' 'INSIGHT' 'ORACLE' 'ASSIST' 'CALC' 'CONNECT'
'CBT101' 'SYBASE-SQL Server' 'LAB' 'ENGLISH' 'EIS'
'PC File Formats' 'GIS' 'ODBC' 'SPECTRAVIEW' /
PASSWORD=xxxxxxxxx PRODNUM=253;
CPU MODEL=' ' MODNUM=' ' SERIAL=' ';
EXPIRE 'ACADEMIC' '31AUG1997'D;
SEC PASSWORD=xxxxxxxxxx;
SAVE; 
RUN;
This example setinit.sas program is for Windows95 and WindowsNT even though the OSNAME mentions WindowsNT. Compare this outdated setinit.sas file to the hard copy of the current year's setinit.sas file. Suppose that the new setinit.sas file you received from Software Distribution Services looked like this:
PROC SETINIT RELEASE='6.12';
SITEINFO NAME='UNIVERSITY OF TEXAS AT AUSTIN'
SITE=11206001 OSNAME='WIN_NTSV' RECREATE
BIRTHDAY='23NOV1992'D EXPIRE='31AUG1998'D PASSWORD=yyyyyyyyy;
ALIAS 'ACADEMIC' 'BASE' 'GRAPH' 'ETS' 'FSP' 'AF' 'OR' 'IML' 'SHARE'
'QC' 'STAT' 'INSIGHT' 'ORACLE' 'ASSIST' 'CALC' 'CONNECT'
'CBT101' 'SYBASE-SQL Server' 'LAB' 'ENGLISH' 'EIS'
'PC File Formats' 'GIS' 'ODBC' 'SPECTRAVIEW' /
PASSWORD=123454321 PRODNUM=253;
CPU MODEL=' ' MODNUM=' ' SERIAL=' ';
EXPIRE 'ACADEMIC' '31AUG1998'D;
SEC PASSWORD=yyyyyyyyy;
SAVE; 
RUN;
You must update the expired setinit.sas file by replacing out of date information in the expired setinit.sas file with new information from the hard copy setinit.sas file you received from Software Distribution Services.
For example, notice that the keyword EXPIRE appears twice in this program, once on line 4 and once on line 11. You must change the EXPIRE='31AUG1997'D to EXPIRE='31AUG1998'D in both locations. Notice also that two of the three PASSWORD fields have different values; you must update these as well. Scrutinize the new setinit.sas file carefully; sometimes SAS modules or features are dropped from or added to the ALIAS line from year to year. Sometimes the name shown in the OSNAME field will change slightly from year to year.
In short, the copy of setinit.sas on your computer must be EXACTLY the same as the copy which you received from Software Distribution Services. When you have completed your modifications to the setinit.sas file, choose SAVE from the FILE menu and then close the text editor window.
If the update is successful, the installer utility will display a message to that effect. Otherwise, the installer will notify you that the license update was unsuccessful and ask you if you would like to examine the setinit.log file for error messages. You can examine this file for errors, correct them, re-edit the setinit.sas file, and repeat these steps as needed until you have successfully updated your SAS System license.
Method Four: Command Line Method for Updating a SAS v8.1 or Lower License
If the license update utility method is ineffective, you can try to update your license using a more complicated approach called the command line method. Follow the steps shown below to update your SAS System license using the command line method.
Step One
Locate the out-of-date setinit.sas file on your computer's hard disk. It can usually be found in the following directory: C:\SAS\CORE\SASINST\. If you installed SAS to a different directory or hard disk, such as D:\, the setinit.sas file should still be locatable in the \CORE\SASINST subdirectory of that directory.
Step Two
Open the setinit.sas file using your favorite Windows text editor. Notepad is usually a good choice. You will see a SAS program that looks like this:
PROC SETINIT RELEASE='6.12';
SITEINFO NAME='UNIVERSITY OF TEXAS AT AUSTIN'
SITE=11206001 OSNAME='WIN_NTSV' RECREATE
BIRTHDAY='23NOV1992'D EXPIRE='31AUG1997'D PASSWORD=xxxxxxxxx;
ALIAS 'ACADEMIC' 'BASE' 'GRAPH' 'ETS' 'FSP' 'AF' 'OR' 'IML' 'SHARE'
'QC' 'STAT' 'INSIGHT' 'ORACLE' 'ASSIST' 'CALC' 'CONNECT'
'CBT101' 'SYBASE-SQL Server' 'LAB' 'ENGLISH' 'EIS'
'PC File Formats' 'GIS' 'ODBC' 'SPECTRAVIEW' /
PASSWORD=xxxxxxxxx PRODNUM=253;
CPU MODEL=' ' MODNUM=' ' SERIAL=' ';
EXPIRE 'ACADEMIC' '31AUG1997'D;
SEC PASSWORD=xxxxxxxxxx;
SAVE; 
RUN;
This example setinit.sas program is for Windows95 and WindowsNT even though the OSNAME mentions WindowsNT.
Compare this outdated setinit.sas file to the hard copy of the current year's setinit.sas file. Suppose that the new setinit.sas file you received from Software Distribution Services looked like this:
PROC SETINIT RELEASE='6.12';
SITEINFO NAME='UNIVERSITY OF TEXAS AT AUSTIN'
SITE=11206001 OSNAME='WIN_NTSV' RECREATE
BIRTHDAY='23NOV1992'D EXPIRE='31AUG1998'D PASSWORD=yyyyyyyyy;
ALIAS 'ACADEMIC' 'BASE' 'GRAPH' 'ETS' 'FSP' 'AF' 'OR' 'IML' 'SHARE'
'QC' 'STAT' 'INSIGHT' 'ORACLE' 'ASSIST' 'CALC' 'CONNECT'
'CBT101' 'SYBASE-SQL Server' 'LAB' 'ENGLISH' 'EIS'
'PC File Formats' 'GIS' 'ODBC' 'SPECTRAVIEW' /
PASSWORD=123454321 PRODNUM=253;
CPU MODEL=' ' MODNUM=' ' SERIAL=' ';
EXPIRE 'ACADEMIC' '31AUG1998'D;
SEC PASSWORD=yyyyyyyyy;
SAVE; 
RUN;
You must update the expired setinit.sas file by replacing out of date information in the expired setinit.sas file with new information from the hard copy setinit.sas file you received from Software Distribution Services.
For example, notice that the keyword EXPIRE appears twice in this program, once on line 4 and once on line 11. You must change the EXPIRE='31AUG1997'D to EXPIRE='31AUG1998'D in both locations. Notice also that two of the three PASSWORD fields have different values; you must update these as well. Scrutinize the new setinit.sas file carefully; sometimes SAS modules or features are dropped from or added to the ALIAS line from year to year. Sometimes the name shown in the OSNAME field will change slightly from year to year.
In short, the copy of setinit.sas on your computer must be EXACTLY the same as the copy which you received from Software Distribution Services.
Step Three
Once you have modified the setinit.sas file on your computer so that it conforms to the specifications of the setinit.sas file you received from Software Distribution Services, save the setinit.sas file and exit the text editor.
From the Windows95/NT START menu, select the RUN... option. If you are using Windows 3.x, choose the RUN option from the File Manager or the Program Manager windows.
In the window that appears, enter the following DOS command:
C:\SAS\SAS.EXE -SETINIT -SYSIN C:\SAS\CORE\SASINST\SETINIT.SAS
Note that if you have installed SAS in a drive or location other than C:, you will need to use the appropriate drive name or subdirectory named, as needed. If you have installed SAS in the default location of C:\SAS, you may copy-paste the command shown above into the RUN... window to minimize the possibility of troubles due to typographical errors.
The C:\SAS\SAS.EXE portion of the command identifies where the actual SAS program, SAS.EXE is located (it's in C:\SAS) and what program to run (SAS.EXE).
The -SETINIT portion of the command is a SAS option that tells SAS that what you are doing is a license update. In other words, you are submitting a setinit.sas license update program to SAS for processing.
The -SYSIN portion of the command is another SAS option that tells SAS where to find the setinit.sas file you modified to contain the new, correct license update information. If SAS gives you an error message that it cannot find the setinit.sas file, use the Windows file manager or explorer to verify that there is a file called SETINIT.SAS located in the C:\SAS\CORE\SASINST\ directory on your computer. Then check to make sure that the contents of the setinit.sas file reflect the updated license information you received from Software Distribution Services.
Step Four
Attempt to start SAS by double clicking on the SAS program icon which can be found in the SAS program group. If you updated the license successfully, SAS will start without incident. If you have not updated the license successfully, SAS will display an error message that the license is out-of-date. At this point you locate a file called setinit.log on your computer's hard disk drive. It is usually found in the C:\SAS directory.
If the contents of setinit.log contain errors or warnings about incorrect passwords, the setinit information you supplied to SAS was in error (however, if there are no errors shown in the setinit.log file and you see SITE INFO UPDATED messages, the license was updated successfully). Even if the error is not related to a password per se, but instead is a typographical error or another type mistake (such as the omission of a feature from the ALIAS list), SAS will display an incorrect password error message in the setinit.log file.
Review your setinit.sas file and make any necessary corrections. If the syntax of the setinit.sas file appears to be correct, try highlighting the blank space immediately following the last semicolon at the end of the last RUN; statement and highlight several rows of blank text beneath the final RUN; statement. Then press your computer's delete or backspace key to remove any hidden characters that may have inadvertently been stored in the setinit.sas file. Then re-run the steps shown above to update the SAS System license.
If you are not able to update your SAS System license after several iterations of the steps shown above, contact stats@ssc.utexas.edu for further assistance. You may submit questions to Statistical Services consultants via email at the address shown below; please copy-paste a copy of your setinit.sas file and a copy of the setinit.log file into the email message to help the consultant diagnose the source of the problem.

Back to Top

 


 

SAS out of memory (UNIX)

Question:

I'm trying to run some matrix-intensive jobs using SAS on a UNIX system. For one job I'm using PROC IML to perform matrix manipulations on a 1400 by 1400 matrix. For another job, I'm using PROC GLM to fit a complex nested design with five factors and all possible interactions. I've found out that SAS doesn't have enough memory to run these jobs. What should I do?

Answer:

SAS has a default RAM (random access memory) allocation default. You can find out what the default RAM allocation is for your system by running the following SAS syntax.
PROC OPTIONS;
RUN;
The options procedure will write your current SAS system settings to the LOG file (if you are using SAS noninteractively) or LOG window (if you are using the SAS Display Manager System). Look for the MEMSIZE value shown by the PROC OPTIONS output. PROC OPTIONS will show the memory size value in megabytes (e.g., a memsize value of 32M indicates that SAS can use up to 32 megabytes of random access memory).
You can override this default value by specifying the -memsize option when you launch SAS or submit a SAS file to SAS for processing.
For instance, suppose you submit your pre-written programs to SAS using the command
/usr/local/sas/sas filename
where filename refers to the name of your file of SAS syntax (note that you can drop the "/usr/local/sas" portion of the command if /usr/local/sas is part of your PATH environment variable). If you wanted to increase the amount of available RAM to SAS to be 64 megabytes, you would modify this command to read
/usr/local/sas/sas filename -memsize 64
You can examine the amount of available memory (in kilobytes) by using the ulimit UNIX command. Using the ulimit command with the -m option will show you the amount of physical RAM available. By contrast, the -v flag for ulimit will display the amount of available virtual memory.
You can set your -memsize about the amount needed by SAS to run your program to insure that SAS can access the necessary amount of RAM to run your program. Another option is to use the 0 option following -memsize instead of a specific RAM amount. The 0 instructs SAS to acquire as much RAM as it needs to run your program.
Be aware that ITS UNIX systems are time-sharing systems. This means that multiple users are using the same computers at any given time. To ensure stable system performance for all users, strongly consider using the UNIX nice command when you run large SAS jobs on ITS UNIX systems. The nice command is included as a prefix before your SAS command; it assigns large jobs lower CPU priority than other jobs on the system. Using the previous example with the nice command added to it, the resulting UNIX command would be
/usr/bin/nice /usr/local/sas/sas filename -memsize 64
More information about the ulimit and nice commands are available via the UNIX "man" pages. To view information about a specific UNIX command, type man followed by the command's name at the UNIX system prompt.

Back to Top

 


 

SAS license updates for Macintosh

Question:

I am trying to update my SAS license for the MacOS. How can I update my license?

Answer:

There are two methods you can use to update your SAS for Macintosh license (under SAS release 6.12 or higher).
SAS System Install Option
1. Invoke the SAS System Install Application from the tools folder in the !SASPath folder by double-clicking on the SAS System Install icon (NOTE: The !SASPath folder is named SAS612 by default; it is the folder where you installed the SAS System onto your computer).
2. Click on the License button in the Install Application window.
3. Locate your SASPath folder by selecting from the dialogs displayed.
4. Locate your updated SETINIT.SAS file by selecting it from the SETINIT dialogs displayed or by confirming the location stated by the Install Application.
5. Exit the SAS System Install Application and check the SETINIT.LOG file created in the System Folder:Preferences:SAS:SASUSER folder for errors.
NOTE: This method for applying the licensing information can fail if your system does not have enough contiguous memory to run both the SAS System and the SAS System Install Application at the same time. In this case, apply the licensing information as a batch program. Use this option if you have less than 20 megabytes of RAM available.
If you have further questions, send email to:
help@its.utexas.edu
Batch Program Option
1. Locate the CONFIG.SAS612 file. It should be in your SASPath folder. Edit the CONFIG.SAS612 file. Add a - setinit and a - sysin entry that specifies the setinit.sas file you want to use to update your licensing information with, as shown in the following example:

-setinit
-sysin "MacHD:SAS612:tools:setinit.sas"

where MacHD is the name of your Macintosh hard drive. 

2. Save your changes to the CONFIG.SAS612 file and invoke the SAS System. Even though the SAS license may be expired, SAS will still run the SETINIT.SAS program.
3. Check the SETINIT.LOG file that is created in the same folder as your SETINIT.SAS file for any errors that may have occurred. If you find errors, verify the information in the SETINIT.SAS file and re-execute the batch program as previously described. If the license update is successful, you should see a message that reads, "Siteinfo data have been updated" and the expiration date should read 31 August [next year].
4. Remove the -setinit and -sysin entries from the CONFIG.SAS612 file after the SETINIT.SAS information has been applied successfully. Do not forget to remove these entries; if you fail to remove them from the CONFIG.SAS612, the SAS System will launch and then shut down immediately.

Back to Top

 


 

Reading SAS times and dates

Question:

I have an unusual date-time variable I need to read from a text file into SAS. A typical value looks like this: 12/22/7012:34:00. I've tried the DATETIME20. informat, but it expects the input data to look like this: 22DEC70 12:34:00. What should I do?

Answer:

In order to read this type of date and time value, you must (1) read the entire string of data as a single character variable, (2) split the string into separate date and time components, (3) change the 12/22/70 to be of the form 22DEC70 (the SAS DATE7. format), (4) recombine the date and time components into one string, and (5) use the SAS DATETIME20. informat to read the modified string. The sample SAS syntax shown below shows how to accomplish these steps.
** Create sample dataset ;
DATA test ;
LENGTH testvar $20 ;
INPUT testvar ;
CARDS ;
12/22/7012:34:00
;
** Create second dataset where transformations take place ;
DATA two ;
SET test ;
datevar = SUBSTR(testvar,1,8); 
/* Extract the first 8 characters of the datetime variable (the date) */
timevar = SUBSTR(testvar,9,16); 
/* Extract the last 8 characters of the datetime variable (the time) */
datenumb = INPUT(datevar,mmddyy8.); 
/* Convert the date portion into a formatted numeric variable */
chardate = PUT(datenumb,date7.); 
/* Convert the new numeric date variable into a string variable */
blank = ' '; 
/* Create a blank to separate the date and time parts in the new string variable */
newdate = chardate||blank||timevar;
/* Create the new date-time string variable via concatenation */
finalvar = INPUT(newdate,datetime20.); 
/* Convert the date-time variable into a numeric SAS date value with the correct format */
RUN ;
** Print the results ;
PROC PRINT ;
FORMAT datenumb date7.
finalvar datetime20. ;
RUN ;
The output appears below.
D C F
T D T A H N I
E A I T A E N
S T M E R B W A
T E E N D L D L
O V V V U A A A V
B A A A M T N T A
S R R R B E K E R
1 12/22/7012:34:00 12/22/70 12:34:00 22DEC70 22DEC70 22DEC70 12:34:00 22DEC1970:12:34:00

Back to Top

 


 

Accessing SAS on the ITS UNIX time sharing servers

Question:

I have an IF account and want to run SAS under UNIX on an ITS server. How can I do this?

Answer:

You will first need to validate your account for UTS (to run SAS on uts.cc.utexas.edu). This can be done by going to the EID protected account maintenance page and clicking on an Add Service button. For the contemporary versions of SAS installed on the ITS UNIX servers, you will need an interface that supports X Windows, either an NCD terminal or some emulator client such as Exodus or Hummingbird Exceed. When your account has been set up, login in and at your shell prompt type the command
eval `/usr/local/etc/appuser`
making certain that the string is in backquotes. This sets up all needed UNIX environment variables for statistical and mathematical applications. Now you can launch the default installation of SAS with the simple command "sas" at the shell prompt. If you want to use another installation, an older legacy version or a newer evaluation version, you will need to use the explicit path to the binary from root.

Back to top

 


 

Running GLIMMIX on the Windows Terminal Server version of SAS

Question:

When attempting run GLIMMIX on the Windows Terminal Server version of SAS, I get the following error in my log:
ERROR: Unable to restore 'Stat.Glimmix_Prod.ModelInfo' from template store!
ERROR: Unable to restore 'Stat.Glimmix_Prod.ClassLevels' from template store!
ERROR: Unable to restore 'Stat.Glimmix_Prod.NObs' from template store!
ERROR: Unable to restore 'Stat.Glimmix_Prod.Dimensions' from template store!
ERROR: Unable to restore 'Stat.Glimmix_Prod.OptInfo' from template store!
ERROR: Unable to restore 'Stat.Glimmix_Prod.IterHistory' from template store!
NOTE: Convergence criterion (ABSGCONV=0.00001) satisfied.
ERROR: Unable to restore 'Stat.Glimmix_Prod.ConvergenceStatus' from template store!
ERROR: Unable to restore 'Stat.Glimmix_Prod.FitStatistics' from template store!
ERROR: Unable to restore 'Stat.Glimmix_Prod.ParameterEstimates' from template store!
ERROR: Unable to restore 'Stat.Glimmix_Prod.Tests3' from template store!
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE GLIMMIX used (Total process time):
real time 0.70 seconds
cpu time 0.17 seconds
I don’t think I’m doing anything wrong – my syntax is correct. How can I fix this error?

Answer:

By default, GLIMIX is not installed in your user directory. This is where GLIMMIX needs to be in order to run on the Windows Terminal Sever. You will need to move some files manually. In order to run GLIMMIX successfully, you will need to follow these steps:
(1) If you are running SAS, close and exit the program.
(2) Open the explorer window by right clicking on the Start button, then selecting Explore.
(3) In the Explorer window, navigate to ‘\\disk\wtspublicconfig$\Files\glimmix’ – you can do this by typing the given address in to the address window.


(4) Copy the two GLIMMIX files found there (glimmix_tpl.sas and templat.sas7bitm).


(5) Paste both files into the directory ‘U:\My Documents\My SAS Files\9.1’.
(6) Close the Explorer window.
(7) Restart SAS.
NOTE: To access the ‘U:\’ drive on the Windows Terminal Server, click on the icon for ‘My Computer’ in the Explorer window. There will be an icon for ‘username’ on ‘disk\homedirs$’. The username will be your Windows Terminal Server username. This is the ‘U:\’ drive. See the following screen shot:


Back to Top

 


Removing duplicate observations from a dataset using SAS

Question:

How can I remove duplicate observations from my SAS dataset?

Answer:

You can use PROC SORT with the NODUPLICATES option to remove unwanted duplicate observations from your SAS dataset. The following sample code illustrates how to use PROC SORT to do this.

DATA test ;
INPUT id varone vartwo ;
CARDS ;
1 23 45
2 35 98
3 83 45
1 23 45
;
PROC PRINT ;
PROC SORT IN=test OUT=test2 NODUPLICATES ;
BY _ALL_;
PROC PRINT DATA=test2 ;
RUN ;

The _ALL_ keyword is required for SAS to correctly identify and remove the duplicate observations.

If you want to remove duplicate records for specific BY variables only (as opposed to all variables, shown above), substitute the keyword NODUPKEY for the NODUPLICATES keyword in the example above. In addition, you should substitute a specific BY variable (e.g., vartwo) for the _ALL_ keyword used in the above example.

For more information, click on the Help button in the SAS menu bar and scroll to SAS Help and Documentation.

Back to Top

 


 

Catching data entry errors with SAS

Question:

How can I use SAS to check for data entry errors?

Answer:

PROC COMPARE compares two SAS datasets with each other. It warns you if it detects observations (rows) or variables (columns) that do not agree across the two datasets. When there are no disagreements, you can be confident that data entry is reliable. To use PROC COMPARE, enter your data twice, once each into two separate raw data files. Then use the two raw data files to create two SAS data sets. Then use PROC COMPARE. The following example compares the two SAS data sets named FRED and SAM.
PROC COMPARE BASE = fred COMPARE = sam ERROR ; 
ID subjctid ;
The BASE keyword defines the data set that SAS will use as a basis for comparison. The keyword COMPARE defines the dataset which SAS will compare with the base dataset. The ERROR keyword requests that SAS print an error message to the SASLOG file if it discovers any differences when it compares the two data sets.
The ID statement tells SAS to compare rows (observations) in the data set by the identifying variable, which here is named SUBJCTID. This variable must have a unique value for each case.
PROC COMPARE features a number of options, many of which are designed to control the amount and type of information displayed in the listing file.

Back to Top

 


 

Creating a counter variable in SAS

Question:

How do I create a count variable in SAS that reflects the order of the subjects in my raw data file?

Answer:

One way to do this is as follows. Include the following statement in your DATA step. It creates a variable ID which reflects the order of the observations in the raw data file.
id+ 1;
That is, the variable ID is assigned an integer value, starting at 1 for the first case, and increasing by one for each subsequent case in the data set.

Back to Top

 


 

Using the SAS KEEP & DROP Statements

Question:

I'm using multiple SAS datasets in the same program, and I only need to keep a few of all of the variables in the last part of my program. Is there an easy way to keep only the variables I need to use? Should I use KEEP or DROP statements?

Answer:

One way is to use either the KEEP or DROP statements in your DATA steps. Which statement you should use will depend on how many of the variables in your SAS dataset you wish to keep. If you only wish to keep a few of many variables, then use the KEEP statement. If you want to drop only a few variables, use the DROP statement. The syntax of the two statements is very similiar (and simple).
The syntax for the KEEP statement is:
KEEP var1 var2 varN ;
Here is an example of using the KEEP statement in a SAS program:
DATA one ;
INPUT ssn age sex $ weight ;
CARDS ;
445768976 23 m 129
487593453 35 f 112
442345213 26 m 198 
;

DATA two ;
SET one ; 
KEEP age weight ;
RUN ;
Here the SAS user has read in some data into dataset ONE. Then a second DATA step creates dataset TWO. The KEEP
statement is used so that only the variables AGE and WEIGHT are included in the dataset TWO.
Alternatively, the researcher could have accomplished the same goal by replacing the KEEP statement with a DROP statement and a new variable list, indicating which variables in the first SAS dataset should be dropped from the new SAS dataset.
DROP ssn sex ;

Back to Top

 


 

Percentile ranks with SAS

Question:

How can I generate percentile ranks using SAS?

Answer:

Use the GROUPS= option for PROC RANK to get percentile (and other x-ile) ranks. The following code stores the percentile ranks for the variables named V1 and V2 in the variables named PRV1 and PRV2.
PROC RANK GROUPS=100 OUT=permsas.dataset;
VAR v1 v2;
RANKS prv1 prv2;
RUN;
Notes on the code:
1. Output: The OUT= option for the PROC RANK statement must be used if you want to create a permanent SAS dataset containing the ranking variables (here PRV1 and PRV2).
2. Output: The RANKS statement must be used if you want to include the original variables in the output dataset, and the VAR statement must be used if the RANKS statement is used. The RANKS statement assigns names to the new variables containing the ranks; the nth name contains the ranks for the nth variable listed in the VAR statement.
3. Other ranks: Use the GROUPS= option to obtain other x-ile ranks, e.g., to obtain quartile (decile) ranks, specify GROUPS=4 (10).

Back to Top

 


 

Formatting dates for output in SAS

Question:

How can I print the dates in a SAS dataset in a more meaningful form?

Answer:

In the SAS DATA step, associate the date variable with a SAS date format. This format will be applied to the variable whenever it is displayed. In the following example, the values in the variable BDAY will be displayed in a ddmmyy format.
FORMAT bday DATE8. ;

Back to Top

 


 

Variable labels in SAS

Question:

How do I generate a label for each level of a variable in SAS?

Answer:

Before the DATA step that will create your SAS dataset, use PROC FORMAT to assign labels to each level. Note that PROC FORMAT only creates informats; the informat must be associated with a variable by using the FORMAT statement in a later DATA step. Here is an example of PROC FORMAT:
PROC FORMAT ;
VALUE fnum
1 = 'Young Males'
2 = 'Young Females' ;
VALUE $fa
'A' = 'Value A'
'B' = 'Value B' ;
RUN ;
The VALUE statement assigns a name to an informat, and a label to each level of the informat. Here the first user-defined informat is named FNUM. It associates the numbers 1 and 2 with the labels Young Males and Young Females respectively. The second VALUE statement shows how to define labels for characters-valued levels; note that these levels must be enclosed in single quotes. Labels can have up to 16 characters.
The informats defined by a previous PROC FORMAT can be applied to variables in a subsequent DATA step by using the FORMAT statement. To use the FORMAT statement, follow the variable name with the appropriate informat. (Note that the informat is specified by the informat name followed by a period. The period indicates that the name is associated with an informat rather than a variable). An informat can be applied to more than one variable. All subsequent output will utilize these labels. Here is an example applying the previous PROC FORMAT informats:
DATA one ;
INPUT number letter $ ;
FORMAT number FNUM. letter $FA. ;
CARDS ;
1 A
2 B
;
It is important to realize that serious complications can result if an informat is applied which has not been defined in a previous PROC FORMAT or saved in a catalog. One way to prevent this is to include the appropriate PROC FORMATs in any program using user-defined informats.
You should not confuse value labels, shown above, with variable labels. Variable labels provide a way for you to assign a longer descriptive set of text to accompany a variable when it appears as part of SAS's output. For instance, the variable name NUMBER (shown above) may not be as descriptive as "Participant Sex". You can use the SAS LABEL statement in a SAS DATA step to assign the label with the variable name, like this:
LABEL number = 'Participant Sex';

Back to Top

 


 

Summing variables with missing data in SAS

Question:

I have a number of variables I need to add together using SAS. Some of them have missing data. What is the best way for me to add them together given that I have missing data?

Answer:

There are two general approaches you can take to sum variables in SAS.
1. The direct adding method. 
With this method, you compute a new variable as a straight sum of the current variables you wish to add.
Newvar = Oldvar1 + Oldvar2 + Oldvar3 ;
In this example, Newvar is the sum of Oldvar1 + Oldvar2 + Oldvar3. If Oldvar1 or Oldvar2 or Oldvar3 has missing data for a given case, then the value of Newvar for that case would also be missing. In other words, if any of the variables to be summed have missing data, the new variable will also have missing data.
2. The function method.
With the function method, you use the SAS SUM (OF operator to add up a number of variables. The advantage of this method is that the syntax is much less laborious to type, especially for large numbers of variables.
Newvar = SUM (OF Oldvar1-Oldvar3) ;
Unfortunately, with this method any variable to be summed which has a missing value is treated as zero by SAS. This means, for example, that if Oldvar1 has a value of 4 and Oldvar2 is missing and Oldvar3 has a value of 3, the value of Newvar would be 7 when the SUM (OF function is used. By contrast, the value of Newvar would be missing under method (1) described previously (where you add the variables together using a plus sign).
If you have both a large number of variables to sum and missing data, what can you do? One solution (provided by Karl Wuensch over the Internet) is use the NMISS (OF function in conjunction with the SUM (OF function, like so:
IF NMISS(OF Oldvar1-Oldvar3) > 0 then Newvar = . ; 
ELSE Newvar = SUM(OF Oldvar1-Oldvar3) ;
This code first calculates the number of missing values across the variables Oldvar1 through Oldvar3. If SAS finds any missing data, it sets the value of Newvar to be missing. Otherwise, the value of Newvar is set to be the sum of the Oldvar1 through Oldvar3 values which have non-missing cases.

Back to Top

 


 

Data as percentages in SAS

Question:

How do I convert the levels of a SAS variable to percentages?

Answer:

Use PROC RANK. In the following example, the variable PRCNTS is created to contain the percentage values for the ranks assigned to the values of the original variable VAR1. Both variables will be contained in the output dataset ROUT. Note that it is the P option in the PROC RANK statement that directs SAS to divide each rank by the number of nonmissing observations and multiply by 100 to get a percentage.
PROC RANK DATA=insds P OUT = rout ;
VAR var1 ;
RANKS prcnts ;

Back to top

 


 

Transfer of SAS datasets

Question:

I tried to transfer a SAS dataset from one computer system to another, but I can't read it on the new computer system. What do I need to do differently?

Answer:

SAS datasets will only work on the same type of computer where the datasets were created. For example, if you created a SAS dataset on a computer running MS-Windows, you could only use that SAS dataset on other computers running MS-Windows.
To move a SAS dataset from one computer to another computer running a different operating system than the first computer, you should use a SAS transport dataset. SAS transport datasets may be read on any type of computer system.
There are three steps for transferring a SAS dataset from one computer system to another: (1) Create a SAS transport dataset copy of the SAS dataset on the originating computer system, (2) transfer the newly-created SAS transport file from the originating computer system to the destination computer system, and (3) convert the SAS transport file to a permanent SAS dataset on the destination computer system.

Back to Top

 


 

SAS missing values

Question:

How does SAS deal with missing data?

Answer:

Whenever SAS encounters an invalid or blank value in the file being read, the value is defined as missing. In all subsequent processes and output, the value is represented as a period (if the variable is numeric-valued) or is left blank (if the variable is character-valued).
In DATA step programming, use a period to refer to missing numeric values. For example, to recode missing values in the variable A to the value 99, use the following statement:
IF a=. THEN a=99;
Use the MISSING statement to define certain characters to represent special missing values for all numeric variables. The special missing values can be any of the 26 letters of the alphabet, or an underscore. In the example below, the values 'a' and 'b' will be interpreted as special missing values for every numeric variable.
MISSING a b ;

Back to Top

 


 

Special characters in SAS variable names

Question:

I keep getting a strange SAS error in my SASLOG file. It seems to be underlining an ampersand (&) character that is part of one of my variable names in my INPUT statement. What is the problem?

Answer:

SAS restricts the use of special characters (e.g., the ampersand, which is used to define a variable as a macro variable). The only special symbol which may be used in a SAS variable name is the underscore (_) character. For example, New_York is a valid SAS variable name, but New-York is not, since the dash is a special character.

Back to Top

 


 

Identifying nonmatches in a SAS match MERGE

Question:

I am match-merging two SAS data sets. I would like to be able to identify, remove, and print out any cases that don't get a match. How can I do this?

Answer:

Use the IN= option in the MERGE statement to identify the observations that have data from both data sets. The following code demonstrates this:
DATA one ;
INPUT x y z ;
CARDS ;
2 2 3 
4 5 6 
7 8 9 
;
RUN;
DATA two ;
INPUT x y z ;
CARDS ;
1 2 3 
4 5 6 
7 8 9 
;
RUN;
PROC SORT DATA=one;
BY x ;
RUN;
PROC SORT DATA=two;
BY x ;
RUN;
DATA mergdset missgset ;
MERGE one (IN = fromone) two (IN = fromtwo) ;
BY x ;
IF fromone = 1 AND fromtwo = 1 THEN OUTPUT mergdset ;
ELSE OUTPUT missgset ;
RUN;
PROC PRINT DATA = mergdset ;
TITLE ' Matched and Merged Observations' ;
PROC PRINT DATA = missgset ;
TITLE ' Unmatched Observations' ;
RUN ;
In this example, the two SAS data sets ONE and TWO are sorted and merged by the variable X. The IN= options creates two new variables that will equal 1 if the corresponding data set contributed data to the current observation, or 0 otherwise. The IF/ELSE statements then use these variables to output any unmatched observations to the SAS data set MISSGSET, and all other observations to the data set MERGDSET.

Back to Top

 


 

Removing centering and header information

Question:

How can I get rid of the header messages SAS prints on my output? I want only the actual procedure output to show up. Also, I don't want my output to be centered; I'd like it left-justified instead.

Answer:

You can use a combination of SAS system options and the Title statement to achieve this goal. Use the following syntax at the beginning of your SAS program:
OPTIONS NONUMBER NODATE NOCENTER ;
TITLE ' ';

Back to Top

 


 

Counting occurrences of patterns in SAS character-valued data

Question:

How can I get SAS to tell me the number of observations which have a common pattern across three character-valued variables?

Answer:

One way to do this would be to concatenate the variables into a single variable and then tally the number of observations having common patterns using the PROC FREQ command. The following code demonstrates this:
DATA one ;
INPUT a $ 1 b $ 3 c $ 5 ;
abc=a||b||c ;
CARDS;
1 1 1
1 5 5
1 6 5
1 6 5
1 5 5
2 4 2
2 3 1
2 2 1
;
PROC FREQ ;
TABLES abc ;
RUN;

Back to Top

 


 

Replacing missing data from a second file

Question:

I have one data file with some missing values in it. I want to replace those with values from a second data file, but only replace those with matching observation values in the second data file.

Answer:

The following SAS program will replace missing data in dataset one with a matching observation from dataset two.
DATA one ;
INPUT x y z $ ;
CARDS ;
1 2 a1
1 3 b1
1 . c1 
2 5 d1
4 6 e1
;
DATA two ;
N+1 ;
INPUT y z2 $ ;
CARDS ;
2 a1
3 b1
4 c1
5 d1
6 e1
;
DATA both ;
SET one ;
N = .;
IF y=. THEN DO WHILE (z2 NE z) ;
SET two ;
END;
DROP z2 ;
PROC PRINT ;
RUN ;

Back to Top

 


 

Using an array in SAS to detect missing values

Question:

How do I exclude observations from my PROC FREQ analysis when a value is missing from a list of variables?

Answer:

In the SAS DATA step, you can create a new variable ("miss" in the example below) that is set equal to 1 when a variable has a missing value, 0 otherwise. Use the ARRAY statement and a DO loop to check for missing values across a list of variables; the syntax is:
DATA one ;
INFILE xxx;
INPUT a b c d e;
miss=0;
ARRAY vv(5) a b c d e ;
DO i=1 TO 5 ;
IF vv(i)=. THEN DO;
miss=1 ;
i=5;
END;
END;
RUN;
PROC FREQ;
WHERE miss =0;
TABLES a b c d e ;
RUN ;
Here, the array "vv" has 5 elements (a,b,c,d,e), and the loop "i" is likewise set to 5. For each observation, the loop iterates 5 times, checking for missing values across the list of 5 variables. When a missing value is encountered, the variable "miss" is set to 1 and the loop stopped for that observation. "Miss" was initially set to zero, and it is only changed if an observation has missing data on any of the five variables. The PROC FREQ then uses the WHERE statement to restrict processing to observations having "miss" set to zero.

Back to Top

 


 

Recoding variable values into missing values in SAS

Question:

I would like to recode numeric zeros in my SAS dataset into SAS system missing values. How can I perform this operation?

Answer:

You can accomplish this task by using an IF-THEN statement in SAS. SAS uses the period symbol ('.') as its missing value identifier.


The following example shows how to convert zeros to the SAS system missing value code.
* SAS Program converts zero numeric values to SAS system missing values ;
DATA ;
INFILE CARDS ;
INPUT id a b c ;
IF a = 0 THEN a = . ;
IF b = 0 THEN b = . ;
CARDS ;
1 2 3 4
2 0 4 5
3 4 0 6
;
PROC PRINT ;
TITLE ' Check for program accuracy' ;
RUN ;
* End sample program ;

Back to Top

 


 

Using the POINT and NOBS options with the SAS SET statement

Question:

I want to create a new SAS dataset by combining two I already have. The combination is fairly strange -- I need each observation in the first dataset to be merged with all the observations in the second dataset. How can I do this?

Answer:

You will use the SET statement with the POINT= option and the NOBS= option. The SET statement names an existing dataset to be read as input to a new SAS dataset. The POINT= option names a variable which indicates which observation from the existing dataset is to be read as input to the new dataset. The NOBS= option creates a temporary constant with the value of the number of observations in the dataset being read.
In the sample code below, the data step will start by naming a new dataset. Then, one DO loop will be started to run from 1 to the number of observations in the first existing dataset. The syntax will appear strange for this DO statement, because the end of the loop is the NOBS constant defined in the following SET statement. The SET statement in the first loop uses the NOBS option to define the end of the loop, and it names the loop indicator variable as the POINT variable. Thus, with each loop through, POINT= moves the input pointer to the next observation, where it stays while the next loop runs.
The next loop is nested within the first. For each cycle of the first loop, the nested loop runs through its entire set. This loop also runs from 1 to the number of observations in the second old dataset. The second SET statement uses the NOBS option to define the end of the loop, and the POINT option names the nested loop indicator variable.
For each cycle through this inner loop, the outer loop does not change. Thus the observation being POINTed to in "old1" remains the same. The inner loop POINTs to a new observation from "old2" with each cycle, and then OUTPUTs the variable values from both datasets' observations to the new dataset. When the inner loop runs through its entire set, the outer loop is incremented, POINTing to the next observation from "old1". Then the inner loop runs through its entire set again. Thus each observation from the first is combined with all the observations from the second. Finally, the indicator variables from each loop are dropped.
Like so:
DATA new;
DO j=1 TO nobs1;
SET old1 NOBS=nobs1 POINT=j;
DO k=1 TO nobs2;
SET old2 NOBS=nobs2 POINT=k;
OUTPUT;
END;
END;
DROP j k;
STOP; 
RUN ;

Back to Top

 


 

One-to-many dataset merging with SAS

Question:

I need to merge two datasets in SAS so that the variables in one dataset are included in with the other dataset. In one dataset, observations represent companies' year-end reported data. In the other dataset, observations represent monthly activity for the year for each company in the first dataset. Thus, each observation in the first dataset will be merged with 12 observations in the second. How can I get SAS to do this kind of merge?

Answer:

To do this you must have some common variable in each dataset that uniquely identifies each company. Then you would use PROC SORT to sort the observations in each of your two SAS datasets by this "id" variable. You would then use the statement to interleave the two datasets together into one SAS dataset. For example:
PROC SORT DATA=dsetone; 
BY idvar;
RUN;
PROC SORT DATA=dsettwo; 
BY idvar;
RUN ;
DATA new;
MERGE dsettwo dsetone ;
BY idvar ;
RUN ;
where "dsetone" is your larger SAS dataset (e.g., the SAS dataset with 12 observations for each level of the id variable which you merge by) and "dsettwo" is the smaller SAS dataset containing only one "yearly" value for each level of the id variable which you merge by. Finally, "idvar" is the id variable by which the other two SAS datasets were sorted and which the MERGE
statement now uses to merge the former datasets together (e.g., "year").

Back to top

 


 

Inputting missing data using SAS

Question:

The option MISSOVER does not work when the missing values are not at the end of the observation, correct? To read this, for example:
546 8456
111 5555
I did:
OPTIONS linesize=80 REPLACE NODATE;
FILENAME indata '/home/grad/veronica/LESSONS/SAS/miss2.txt';
DATA NEW1;
INFILE indata missover;
INPUT @1 year @9 foodcons @17 prretail @25 dispinc;
RUN ;
PROC PRINT;
RUN;
and got:
OBS YEAR FOODCONS PRRETAIL DISPINC
1 546 8456 8456 .
2 111 111 5555 5555
Is there a way for SAS to interpret empty positions in the middle of a record as missing values?

Answer:

One way to fix the problem is to explicitly define the column width of each variable using SAS formatting input.
The modified program looks like this:
OPTIONS LS=80 REPLACE NODATE ;
DATA new1 ;
INFILE CARDS MISSOVER ;
INPUT @1 (year) (3.) @9 (foodcons) (3.) @ 17 (pretail) (4.) @25 (dispinc) (4.) ;
CARDS ;
546 8456
111 5555
;
RUN ;
PROC PRINT ;
RUN ;
The (3.) and (4.) in the INPUT statement refer to the column width of each variable in question: 3 columns and 4 columns, respectively. Note that if you had data with columns appearing behind a decimal point, such as a GPA of 3.46, then your column specification would be 4.2: 4 total columns, including the decimal point, with two of those four columns consisting of values behind (to the right of) the decimal point.
The program results in the following output:
OBS YEAR FOODCONS PRETAIL DISPINC
1 546 . 8456 .
2 . 111 . 5555

Back to Top

 


 

Creating a lead or lag variable using SAS

Question:

I have a problem with creating a lag variable. My data set looks like this:
MKT STK T RTN PE
1 1 1 5 4
1 2 1 3 5
1 1 2 5 5
1 2 2 7 6
2 1 1 4 5
2 2 1 5 5
2 1 2 4 5
2 2 2 4 5
In other words, each observation is in a certain market, MKT, and is a certain stock within each market, and all have a time variable (monthly). I need to do some time series regressions. For each stock within each market, I need to regress the variable PE on the RTN variable FOR THE NEXT MONTH. The easiest way to do this is to lag the PE variable in SAS (instead of leading the RTN variable). However, I need to create a lagged PE variable for each stock in each market. The problem is the lag command in SAS just takes the previous observation's value. I can sort by MKT, STK and T, but it will still cross over these distinctions.
My data set has probably 225 different MKT/STK combinations, so I need to do some kind of DO loop instead of using IF statements. I thought about creating a MKTSTK variable that identified the market and stock together. Now I need to create a statement that says "Do while MKTSTK = certain number, LAGPE = LAG(PE)", so it will create a lagged pe variable for all observations that belong to the same MKTSTK group. Then it needs to go to the next MKTSTK group until it has gone through all the observations with each of the groups.
Can you help? Is there a "DO WHILE" command in SAS? I couldn't find one in my books.

Answer:

You can use a procedure called EXPAND to create either a lead or a lag variable. Furthermore, this procedure allows for both lagging and leading variables. The EXPAND procedure is described in the SAS Help documentation located under the Help button in SAS; it can be found by typing EXPAND procedure under the Index tab. Alternatively, the same information can be found in the online SAS documentation at http://support.sas.com/onlinedoc/913/docMainpage.jsp; under the Index tab, type EXPAND procedure in the "Jump to" box.
Here's how it works: Any variable you want to carry over in order you will specify with the CONVERT statement, such as CONVERT mkt or CONVERT stk. These will carry over verbatim into the new dataset you specify in the OUT= statement on the PROC EXPAND line. Notice that you'll also use the METHOD=none option to reduce the computation time needed to run the procedure.
For the return variable, you can not only send a verbatim copy to the new dataset, you can also use the TRANSFORM option to send a lead version called "leadretn" to the new output dataset.
EXPAND will add a counter variable called "TIME" to your output dataset. I took a brief look at the EXPAND documentation and found no way to deliberately turn off this feature. You can use an ID statement to switch TIME to an explicit variable which is already present in the dataset (usually this variable would be in SAS date format already), but saw no way to prevent SAS from adding it to the output dataset.
I've included sample syntax with the small clip of data you sent us below, along with the output this job produces. If I've understood what you're asking for, this should fulfill your requirements exactly.
* Begin sample program ; 
OPTIONS ls = 72 ;
DATA one ;
INFILE cards ;
INPUT market stock t return pe ;
CARDS;
1 1 1 5 4
1 2 1 3 5
1 1 2 5 5
1 2 2 7 6
2 1 1 4 5
2 2 1 5 5
2 1 2 4 5 
2 2 2 4 5
;
RUN ;
PROC PRINT ;
RUN ;
PROC SORT ;
BY market stock ;
RUN ;

PROC EXPAND DATA = one OUT = two METHOD = NONE ;
CONVERT market ;
CONVERT stock ;
CONVERT t ;
CONVERT pe ;
CONVERT return ;
CONVERT return = leadretn / TRANSFORM = (LEAD 1) ; 
BY market stock ;
RUN ;
PROC PRINT DATA = two ;
RUN; 
* End sample program ; 

OUTPUT:
OBS MARKET STOCK T RETURN PE
1 1 1 1 5 4
2 1 2 1 3 5 
3 1 1 2 5 5
4 1 2 2 7 6
5 2 1 1 4 5
6 2 2 1 5 5
7 2 1 2 4 5
8 2 2 2 4 5
OBS MARKET STOCK TIME T PE RETURN LEADRETN
1 1 1 0 1 4 5 5
2 1 1 1 2 5 5 .
3 1 2 0 1 5 3 7
4 1 2 1 2 6 7 .
5 2 1 0 1 5 4 4
6 2 1 1 2 5 4 .
7 2 2 0 1 5 5 4
8 2 2 1 2 5 4 .

Back to Top

 


 

Combining multiple lines into one using SAS

Question:

I want to combine multiple lines (rows) of data into one line (a single row) of data in a SAS data set. How can I do this?

Answer:

Here is some code to combine multiple lines into one line. It uses SAS's ARRAY function.
* Begin sample program ;
DATA full; *this step just creates the full dataset; 
INPUT id $ stage $ x y @@;
CARDS;
jan z 1 1 jan x 2 2 jan d 3 3 jan e 4 4 joe a 11 1 joe b 22 2 joe r 33 3
joe z 44 1 pam b 2 1 pam r 2 2 pam i 2 1 pam s 2 2
;
RUN;
PROC PRINT DATA=full; 
RUN;
DATA redone;
ARRAY u(4) newvar1-newvar4;
DO i = 1 TO 4;
SET full;
u(i) = x;
END; 
DROP i;
RUN;
PROC PRINT DATA=redone; 
RUN;
* End sample program ;
Output: 
OBS ID STAGE X Y
1 jan z 1 1
2 jan x 2 2
3 jan d 3 3
4 jan e 4 4
5 joe a 11 1
6 joe b 22 2
7 joe r 33 3
8 joe z 44 1
9 pam b 2 1 
10 pam r 2 2
11 pam i 2 1
12 pam s 2 2

OBS NEWVAR1 NEWVAR2 NEWVAR3 NEWVAR4 ID STAGE X Y
1 1 2 3 4 jan e 4 4
2 11 22 33 44 joe z 44 1
3 2 2 2 2 pam s 2 2

Back to Top

 


 

Reverse scoring survey items using SAS

Question:

I need to reverse score a large number of survey items using SAS. I could use individual logic statements to do this like item1 = 6 - item1; , but it would take forever to specify the statements one-by-one. Can you suggest a faster method?

Answer:

Yes. Use an ARRAY statement and a DO loop to perform the transformation. Suppose that you have a survey which uses five point Likert scaling. In this instance, you want to subtract every score from six for items that require reverse scoring. Further suppose you had 50 items that required reverse scoring. You could write 50 logic statements, or you could write a single ARRAY statement, like this:
ARRAY convert{50} Item1 -- Item50 ;
DO i = 1 TO 50 ;
IF i IN (1,6,8,10,12,13,14,15,17,19,21,23,26,27,29,30,31) THEN DO ;
convert{i} = 6 - convert{i} ;
END ;
END;
DROP i ;
Item1 through Item50 are your Likert item variables which you want to reverse score. "Convert" is the name you give the conversion array, and i is a counter or looping variable. Since there are 50 item variables in the dataset, your array will have 50 elements and your DO loop will run through these 50 elements. The IN option in the IF statement tells SAS to perform the reverse coding on only the items listed on after the IN keyword (e.g., Item1, Item6, etc.).
For more information, click on the Help button in the SAS menu bar and scroll to SAS Help and Documentation.

Back to Top

 


 

Contrast coding using PROC MIXED

Question:

I've run an ANOVA with one between-subjects factor (GROUP) and one within-subjects factor (TIME). Group has three levels; time has three levels. My dependent variable is anxiety, measured at three equally-spaced intervals.
I now want to run a contrast analysis. I want to compare the means of group 1 to group 2 across all three measurement occasions of anxiety and get an F-test of my hypothesis. How can I specify the contrast using SAS?

Answer:

This FAQ assumes you are familiar with the logic of contrast coding. It also assumes you know how to generate appropriate contrast codes or weights to test your hypotheses of interest. A primer on generating contrast codes can be found in General FAQ #21: Contrast coding.
You can use PROC GLM or PROC MIXED in SAS to perform repeated measures ANOVA. Each procedure has strengths and weaknesses; one nice MIXED feature is its ability to perform comparisons involving within and between-subjects factors in the same contrast. PROC GLM uses separate matrices for between-subjects effects versus within-subjects effects. This is not a problem if you are interested in between-subjects effects or within-subjects effects, but it can present complications if you want to generate contrasts that cross levels of both between and within-subjects effects simultaneously. For this reason, this FAQ illustrates the more general contrast approach offered by PROC MIXED.
PROC MIXED can use only one dependent variable for each analysis. If your data are in multivariate form (the SAS dataset contains a single row for each case and separate variables for each measurement occasion of your repeated measures variable), you must first transpose the data into a form that PROC MIXED can use. See SAS FAQ 75: Converting SAS multivariate repeated measures data to univariate format for details on this procedure.
In PROC MIXED the general form of contrast statement specification is as follows:
CONTRAST "contrast-name" variable-name weights / OPTIONS ;
where "contrast-name" is a quoted string that identifies the contrast on the software's output, "variable-name" is the name of the variable (e.g., GROUP), and "weights" are the contrast weights you've generated. You may also specify various options following the slash ( / ) symbol.
For single degree of freedom t-test contrasts you may use the ESTIMATE statement. The ESTIMATE statement has the same form as the CONTRAST statement, but it produces an estimated mean difference, standard error, t-test, and associated probability value instead of the F-test generated by the CONTRAST statement.
Consider the following PROC MIXED syntax:
PROC MIXED DATA = one INFO ; 
CLASS group time id ;
MODEL y1 = group time group*time ; 
REPEATED time / SUBJECT = id TYPE = un ;
CONTRAST 'group1 v. group2' group 3 -3 0
time 0 0 0
group*time 1 1 1 -1 -1 -1 0 0 0 / E ;
ESTIMATE 'group v. group2' group 3 -3 0
time 0 0 0
group*time 1 1 1 -1 -1 -1 0 0 0 / E
CL ALPHA = .10 DIVISOR = 3 ;
LSMEANS group time group*time ;
RUN ;
The PROC MIXED line specifies the SAS data set used for the analysis; the INFO option requests additional printed information about the analysis. The CLASS statement specifies the three classification variables in the analysis: the between-subjects grouping variable (GROUP), the repeated factor (TIME), and a variable that represents each individual case in the analysis (ID).
The MODEL statement contains the single dependent variable, Y1, and the following effects: GROUP, TIME, and the GROUP by TIME interaction. The REPEATED statement alerts PROC MIXED that the TIME variable is a repeated measurement factor. The SUBJECT option tells PROC MIXED that the case variable is ID. Finally, the TYPE option instructs PROC MIXED to compute all omnibus and contrast effect tests using an unstructured covariance matrix of the repeated measurements of the Y variable. This means that each contrast and omnibus effect test will have a separate error term associated with it. If you want to compute contrasts and omnibus effect tests using a common error term, you must specify a different choice (e.g., CS or HF) for the TYPE option.
The CONTRAST statement contains the contrast label "group1 v. group2" which will identify the contrast on the printed output. The label is followed by each variable specified in the MODEL statement, followed by its contrast codes. The E option tells SAS to print the contrast weights. These are helpful for diagnosing errors and making sure you're using the correct contrast weights to test your hypothesis.
The ESTIMATE statement also features the DIVISOR and CL options. SAS requires integer contrast codes --- it does not allow fractional contrast weights (e.g., 1/3). For this reason, you must often multiply the contrast weights from your original hypothesis by a constant (e.g., 3) so that each contrast weight is an integer (e.g., 1). Unfortunately, when you multiply the contrast weights by a constant, SAS also multiplies the contrast estimate by the same constant, which hinders its interpretability. The DIVISOR option allows you to divide the contrast estimate by the constant to correct this problem.
The CL option produces confidence intervals. The default confidence interval is 95 percent, but you can change this value by using the ALPHA option. In this example, you want the 90 percent confidence interval, so we specify ALPHA=.10 as the optional confidence interval value.
The LSMEANS statement instructs SAS to print means and standard errors for the GROUP and TIME main effects as well as the GROUP by TIME interaction. You can graph these means to gain a greater understanding of your results.
For more information on specifying contrast codes using PROC MIXED, examples, and interpretation of PROC MIXED output, see the SAS Institute publications SAS for Mixed Models, Second Edition, and SAS/STAT Software: Changes and Enhancements through Release 6.12; the book Generalized, Linear, and Mixed Models (Wiley Series in Probability and Statistics) also received good reviews.
You can also review PROC MIXED syntax using the SAS System on-line help facility by clicking on the Help button, then scrolling to SAS Help and Documentation. Enter the keyword MIXED PROCEDURE under the Index tab. SAS will then show a list of topics and options associated with the MIXED procedure; choose the most relevant topic and then click the DISPLAY button to view the contents of that topic area.

Back to Top

 


 

Extracting cases with a given string from SAS

Question:

I am trying to extract data on a certain drug from a very large database. A frequency tabulation reveals that this drug is coded/spelled in a variety of ways. All of the codes, however, contain the word BACTRIM --e.g. BACTRIM D.S. or BACTRIM D or DS1/ BACTRIM.
Is there an efficient way to select all of these values with one subsetting if statement without having to write out each value? It is written a total of 50 ways. Can I use a substring function? If so, how? Could you provide me with a bit of sample code for doing this?

Answer:

The sample code shown below should help you to configure your SAS program to extract only the records containing "BACTRIM".
DATA one ;
LENGTH drug $20 ;
INFILE CARDS ;
INPUT DRUG & ;
CARDS ;
BACTRIM D.S.
BACTRIM D
DS1/ BACTRIM
NOT VALID
ANOTHER NOT VALID
;
RUN ;
DATA two ;
SET one ;
flag = 'BACTRIM';
new_drug = INDEX(drug,flag);
RUN ;
PROC PRINT DATA = two ;
TITLE 'All records';
RUN ;
DATA three ;
SET two ;
IF new_drug NE 0 ;
RUN ;
PROC PRINT DATA = three ;
TITLE 'Reduced database';
RUN ;
The first DATA step reads in the sample records you sent in your E-mail message, plus a couple of additional records that do not meet the BACTRIM criterion.
The second DATA step gets the information from the first DATA step using the SET statement. Then we create a new variable called "flag". Flag is the string we use to limit the records for inclusion. In this case it is "BACTRIM". Now we create another new variable called "new_drug". New_drug is actually a numeric variable. SAS has a function called INDEX that will return the location number of the first character in the string. The first variable required by INDEX is the variable of interest. In this case that is the variable "drug". The second variable required by INDEX is the text string to search for; in this case that is the variable "flag".
The third DATA step gets the information from the second DATA step and limits the cases chosen for inclusion in the third DATA step to only those that have a new_drug number not equal to zero. The INDEX function returns a zero value for new_drug if it cannot find the string "BACTRIM" in the subject's drug variable, so we want to include only cases that have a non-zero value for new_drug.
It is possible and even desirable to compress this program into a single DATA step. For purposes of illustration we show each step in clear and distinct parts.
For more information, click on the Help button in the SAS menu bar and scroll to SAS Help and Documentation.

Back to Top

 


 

Creating an episode splitting variable in SAS

Question:

I want to create a new dataset with split episodes for the purpose of event history analysis. I have the timing of first marriage/censor as a continuous variable (16-90) and a variable "event" that indicates whether the the age is first marriage (=1) or censor (=0). I do not have any time varying covariates. I want to break the age of first marriage into 8 intervals (16-17, 18-19,....30+) to capture changes in risk over time. If someone has an event (marriage), then I want them to 'fall' out of the new data set. This new data would also include several covariates (although they would be the same across the time intervals for each individual.
example: (original data)
id agemarr event sex
1 22 1 1
2 20 0 0
3 16 1 0
(new data)
id interval time_int event sex
1 1 2 0 1
1 2 2 0 1
1 3 2 0 1
1 4 1 1 1
2 1 2 0 0
2 2 2 0 0
2 3 1 0 0
3 1 1 1 0
Any suggestions on how to do this in SAS?

Answer:

Using the sample data you provided, SAS code was createdthat should reshape the data in the manner you wish. The nevent variable correctly captures the appropriate value of event for each case.
**Creating reshaped dataset for event history analysis;
data test;
input id agemarr event sex;
datalines;
1 22 1 1
2 20 0 0
3 16 1 0
;
**Creating variable to use in controlling the next do loop
** in order to only output cases up until the event occurs;
maxint=0;
if (agemarr = 16 or agemarr = 17) then maxint = 1;
else if (agemarr = 18 or agemarr = 19) then maxint = 2 ;
else if (agemarr = 20 or agemarr = 21) then maxint = 3 ;
else if (agemarr = 22 or agemarr = 23) then maxint = 4 ;
else if (agemarr = 24 or agemarr = 25) then maxint = 5 ;
else if (agemarr = 26 or agemarr = 27) then maxint = 6 ;
else if (agemarr = 28 or agemarr = 29) then maxint = 7 ;
else if agemarr >= 30 then maxint = 8 ;
**Do loop to output cases up till time_int = 1;
do interval = 1 to maxint ;
time_int = 2; **Initializing time variable;
nevent=0; **Creating new event variable;
if interval = 1 and (agemarr = 16 or agemarr = 17) then do;
time_int = 1 ; nevent=event; end;
if interval = 2 and (agemarr = 18 or agemarr = 19) then do ;
time_int = 1 ; nevent=event; end;
if interval = 3 and (agemarr = 20 or agemarr = 21) then do ;
time_int = 1 ; nevent=event; end;
if interval = 4 and (agemarr = 22 or agemarr = 23) then do ;
time_int = 1 ; nevent=event; end;
if interval = 5 and (agemarr = 24 or agemarr = 25) then do ;
time_int = 1 ; nevent=event; end;
if interval = 6 and (agemarr = 26 or agemarr = 27) then do ;
time_int = 1 ; nevent=event; end;
if interval = 7 and (agemarr = 28 or agemarr = 29) then do ;
time_int = 1 ; nevent=event; end;
if interval = 8 and agemarr >= 30 then do ;
time_int = 1 ; nevent=event; end;
output;
end;
keep id interval time_int nevent sex ;
run;
proc print data = test ;
run ;
title1 'Reshaped data for event history analysis';
run;

Back to Top

 


 

Univariate to multivariate data transposition in SAS

Question:

I have been given a SAS dataset that I must transpose. Right now the data have multiple rows for each couple and a few variables. Instead, I want there to be one row per couple and many repeated measurements per couple. How can I do this using SAS?

Answer:

According to SAS Technical Support, you can do this using the TRANSPOSE procedure. The sample program below assumes that your couple ID variable is named cpl. The program first gets the permanent SAS dataset and adds a variable called n that serves as an ID or observation number. The data are then sorted by the n and cpl variables.
The first PROC TRANSPOSE creates a single column of data containing the variables of interest, sorted by cpl. The second PROC TRANSPOSE completes the transposition process by converting the multiple rows of data into multiple column variables; each variable is named Ak where k refers to the column number.
** Get the original SAS dataset ;
DATA one ;
SET mylib.dset1 ;
n = _n_ ;
RUN ;
** Sort the dataset by the couple ID variable ;
PROC SORT DATA = one ;
BY n cpl ;
RUN ;
** Run the first transposition to obtain a single column of data ;
PROC TRANSPOSE DATA = one OUT = first(DROP = _name_);
BY n cpl;
RUN ;
** Print the first ten rows of data in the transposed dataset ;
PROC PRINT DATA = first ;
WHERE n LE 10 ;
RUN ;
** Transpose a second time to obtain the final data structure ;
PROC TRANSPOSE DATA = first OUT = second(DROP = _name_) PREFIX = a;
BY cpl;
VAR col1;
RUN ;
** Print the first ten cases in the new data format ;
PROC PRINT DATA = second ;
WHERE cpl LE 10 ;
RUN ;
** End sample program ;

Back to Top

 


 

Identifying the last completed measure using SAS

Question:

I have three scores on a repeatedly measured variable. These scores are represented as SCORE1, SCORE2, and SCORE3 in my SAS dataset. Some participants have data for all three measurement occasions, but other participants do not have complete data. I want to identify the last completed measure for each participant so that I can subtract the first score from the last score. How can I do this using SAS?

Answer:

There are several ways you can accomplish this task, but SAS Technical Support has supplied a very compact and efficient example to address your question. It appears below.
The ARRAY statement defines a SAS array called SCORES that has as many elements as there input variables (three in this example, score1 through score 3). The first DO loop increments through the three variables in reverse order; the second DO loop assigns the appropriate score value to the new variable NEWEND only if the value of the SCORES array element is not equal (NE) to a '.' (period), the SAS internal representation of a missing value. The remainder of the program reads the sample data used in this example.
** Begin sample program ;
DATA one ;
INFILE CARDS TRUNCOVER ;
INPUT id score1 score2 score3;
ARRAY scores(*) score1-score3;
DO i = 3 TO 1 BY -1;
IF scores(i) NE . THEN DO;
newend=scores(i);
LEAVE;
END;
END;
CARDS;
1 1 . .
2 1 2 .
3 1 2 3
4 . 2 3
5 . . 3
;
PROC PRINT DATA = one ;
RUN ;
** End sample program ;
Running this self-contained program and examining the output should prove helpful to understanding how the program works.

Back to Top

 


SAS chi-square test of independence and the phi coefficient

Question:

I need to generate a chi-square test of independence using SAS, and I also need to get the phi coefficient. My frequencies are stored in a separate variable.

Answer:

Both the chi-square test of independence and the phi coefficient are output by SAS PROC FREQ.
The example SAS program below illustrates how to use PROC FREQ to obtain both the chi-square test for independence and the phi coefficient when the cell frequencies are stored in a variable.
DATA one ;
INPUT v1 v2 count ;
CARDS ;
0 0 1
1 0 3
0 1 11
1 1 25
;
* "Count" is the weighting variable containing cell sizes.;

PROC FREQ ;
WEIGHT count ;
TABLES v1*v2
/ CHISQ MEASURES ;
TITLE ' V1 by V2 Chi-Square Test and Phi Coefficient';
RUN;
The instream data contain three variables: V1, V2, and COUNT. Both V1 and V2 have only two possible values, 0 and 1 . Thus, the V1-by-V2 contingency table will be a 2-by-2 table.
The COUNT variable indicates the actual frequencies which appear in each cell of the 2-by-2 table. This is specified in the PROC FREQ by the WEIGHT statement. The TABLES statement specifies how the contingency table is to be built. The two variables which will produce the table are listed with an asterisk (*) between them. The first variable defines the rows of the table, the second defines the columns. Several chi-square based tests of independence and measures of association are requested by the option CHISQ.

Back to Top    

 


 

Computing a Kappa statistic using SAS

Question:

How can I compute a Kappa reliability coefficient using SAS?

Answer:

SAS added this statistic at release level 6.10 as an optional part of the FREQ procedure.
The command is:
PROC FREQ ;
TABLES var1 * var2 / AGREE ;
Assuming a standard usage as a test of inter-rater reliability, then each observation in the SAS dataset is an event, and "var1" and "var2" are the variables indicating each event's categorization by rater1 and rater2 respectively. The table produced by the crossing of var1 with var2 must be square.

Back to Top

 


 

SAS Linesize Option

Question:

How can I change the width of my SAS output?

Answer:

There are two ways to do this:

1. Use the SAS OPTIONS statement.

To set your output to a column width of n, use the following syntax.

OPTIONS LINESIZE = n ;

This statement can be placed anywhere in a SAS command file.

2. Use the SAS menu bar.

Click on the Output window.

Click on Tools in the menu bar, then Options, then Output.

Click on the Display tab.

Select Linesize and click OK.

Back to Top

 


 

SAS/GRAPH scatter plots of predicted values

Question:

How do I output predicted values from PROC REG and display them in SAS/GRAPH's PROC GPLOT?

Answer:

Following the model statement in PROC REG, use an OUTPUT statement with the keyletter P to write the predicted values to a new dataset. This dataset can be subsequently used in PROC GPLOT. For example:
PROC REG ;
MODEL y=x ;
OUTPUT OUT=regout P=yhat ;

PROC GPLOT DATA=regout ;
PLOT yhat*x ; RUN ;
PROC GPLOT will produce a scatter plot utilizing the default SAS/GRAPH symbol settings. The predicted values YHAT will be plotted against the X values.

Back to Top

 


 

Selection of a random subset of data in SAS

Question:

How do I randomly sample a certain proportion of observations from a SAS dataset?

Answer:

In the DATA step, include the line:
IF RANUNI(0)<=.1 ;
This statement will randomly select approximately 10% of the observations from the original data. To change this proportion, change .1 to any value between 0 and 1. The value you specify will determine (approximately) the proportion of the original dataset that will be selected for inclusion in the current dataset.

Back to Top

 


Plot size settings in PROC PLOT

QUESTION:

The SAS plot procedure always gives me a plot that is too big to fit into one screen. How can I reduce the size of the plot?

Answer:

One way to reduce the size of the plot output is to specify the dimensions of the vertical and horizontal axes by using the VPERCENT and HPERCENT options in PROC PLOT, as shown below:
PROC PLOT HPERCENT=50 VPERCENT=33;
PLOT x*y;
RUN;

Back to Top



Appending mean values to each observation in SAS

Question:

How do I take the mean of a variable and put it in a SAS data set containing the original data?

Answer:

Use PROC MEANS to output the mean of a variable into a SAS data set. Then conditionally combine the data sets using the special SAS system variable _N_ .
In the following example, the means of the original variables A and B are stored in the variables named MA and MB, respectively. In the data set COMBINE1, each observation will have the same value for MA (the mean of the variable A), and similarly for MB.
PROC MEANS DATA=orig;
VAR a b ;
OUTPUT OUT=mout MEAN=ma mb ; 
RUN ;
DATA combine1;
IF _N_ = 1 THEN SET mout ;
SET orig ; 
RUN ;

Back to Top

 


Confirmatory factor analysis using SAS

Question:

How can I perform a confirmatory factor analysis using SAS? How does this differ from an ordinary (exploratory) factor analysis?

Answer:

A confirmatory factor analysis differs from exploratory (ordinary) factor analysis in that you specify the structure of three matrices a priori (in advance) of data analysis. The three matrices to be specified are 1) the factor loading matrix, 2) the factor intercorrelation matrix, and 3) the unique variance matrix.
The chief advantage of confirmatory factor analysis is that it allows you to test hypotheses about specific factor structures. Thus, the null hypothesis is the solution you specify. If the dataset you analyze departs significantly from the null hypothesis, you reject the null hypothesis and conclude that the factor structure you propose does not fit the obtained data.
To carry out a confirmatory factor analysis in SAS, use PROC CALIS. An example of a confirmatory factor analysis program is detailed below, in which a six-item questionnaire is analyzed. An oblique two-factor solution is hypothesized. It is hypothesized that items 1 through 3 load primarily on Factor 1 while items 4 through 6 load primarily on Factor 2. The unique (error) variances are assumed to be equal and small.
If you wanted items 4 though 6 to be zero for Factor 1 and items 1 through 3 to be zero for Factor 2 (the usual type of hypothesis specified in a confirmatory factor analysis), you could modify the program shown below to impose those constraints by setting the values of appropriate matrix elements to be equal to zero rather than a parameter name with a starting value (e.g., {1,2} = 0).
* Begin Sample Program ;
TITLE ' Confirmatory FA for six-item questionnaire';
PROC CALIS METHOD = LSML ALL NOMOD ;
Var Item1-Item6;
/*
The METHOD = LSML option uses final parameter estimates from
unweighted least-squares as initial estimates for maximum-
likelihood. The ALL option requests all optional output. The
NOMOD option tells SAS not to compute the modification indices--
this option saves computation time when the ALL option is used
*/
FACTOR HEYWOOD N = 2;
/*
N = 2 specifies a two factor solution ;
Option HEYWOOD constrains the diagonal elements of the unique
variance matrix _U_ to be nonnegative
*/
MATRIX _F_
{1,1} = Item1F1 ( .80), {1,2} = Item1F2 ( .20),
{2,1} = Item2F1 ( .80), {2,2} = Item2F2 ( .20),
{3,1} = Item3F1 ( .80), {3,2} = Item3F2 ( .20),
{4,1} = Item4F1 ( .20), {4,2} = Item4F2 ( .80),
{5,1} = Item5F1 ( .20), {5,2} = Item5F2 ( .80),
{6,1} = Item6F1 ( .20), {6,2} = Item6F2 ( .80) ;
/*
The matrix being defined here is _F_, the factor loading matrix.
A MATRIX statement defines the initial values for the
parameter estimates -- any unspecified entry is set to .5.
Numbers in the braces give the location of the entry.
Parameter estimate names such as "Item1F1" are user supplied.
Numbers in parentheses are the hypothesized factor loadings
*/
Matrix _P_ {1, 1} = 1.0, {2, 2} = 1.0, {2, 1} = .60 ;
/*
The _P_ matrix defaults to an identity matrix, indicating an
oblique factor structure
*/
Matrix _U_ {1, 1} = Theta1-Theta6 6*.10 ;
/*
Matrix _U_ is the error or uniqueness matrix. Since we are
assuming equal values for each of the diagonal elements in our
matrix, we can use a shortcut: the notation n*r generates n values
of r. Here 6*.10 tells SAS to set the initial estimates of the
parameters Theta1 through Theta6 to .10.
*/
RUN ;
* End of sample program ;
The assumptions underlying confirmatory factor analysis as well as the interpretation of the output can be exceedingly complex.

Back to Top

 


Pagesize option in SAS

Question:

Is there any way to control the vertical sizing of SAS output? In other words, can I set the length of the output pages?

Answer:

Yes, there is a PS (pagesize) option that can be set for SAS output. This option controls how many rows of characters are printed on each page. The default value for PS is 55. For example, the following command sets the number of output rows per page at 60.
OPTIONS PS=60;

Back to Top

 


 

Plotting a regression line using SAS/GRAPH

Question:

Is there a way I can generate a scatterplot with a regression line and 95% confidence intervals superimposed using SAS/GRAPH?

Answer:

Yes. The following SAS code demonstrates how to do this. It consists of three steps. Step 1 is a SAS DATA step which creates a set of demonstration data we use to illustrate the SAS/GRAPH technique for fitting a regression line. The data consist of a single X predictor and a single Y outcome (dependent) variable.
Step 2 is a PROC REG which we use to check the accuracy of the sample data we generated in Step 1. The PROC REG is not actually necessary for the plot itself, but it is a useful error-catching mechanism.
Step 3 begins with the GOPTIONS line. Here we define a SAS/GRAPH symbol which provides both the regression line, as well setting the color markers for the points of the scatterplot.
* Begin sample SAS/GRAPH program ;
* Step 1 ;
DATA demo ;
DO i=1 TO 1000 ;
x=RANNOR(0) ;
y=(.36 * x) + ((1 - .36**2)**.5)*RANNOR(0) ;
* This generates two variables with y as a function of x as described above;
OUTPUT ;
END ;
* Step 2 ;
PROC REG DATA = demo ;
MODEL y = x / STB ;
* PROC REG is actually not necessary for the plot, but it is nice to see
that the data creation process worked correctly;
* Step 3 ;
GOPTIONS RESET=GLOBAL ;
SYMBOL INTERPOL=RLCLM95 VALUE=star CI=blue CO=green CV=red;
*This symbol definition will allow us to produce a scatter-plot of observed
values as red stars, with a blue regression line of best fit with green 95%
confidence intervals around the line;
PROC GPLOT DATA=demo;
PLOT y*x ;
RUN ;
*End sample SAS/GRAPH program;

Back to Top

 



Internal consistency statistics using SAS

Question:

I want to obtain an internal consistency index (say, Cronbach's alpha) for my questionnaire. How can I do this using SAS?

Answer:

Use PROC CORR with the ALPHA and NOMISS options. For instance:
PROC CORR ALPHA NOMISS ;
VAR var1 var2 varK ;
The ALPHA option calculates the correlation between each variable and the total of the remaining variables and calculates Cronbach's alpha using only the remaining variables. The NOMISS option will remove an observation with a missing value on any variable from the analysis. It is important to include the VAR statement so that no unwanted variables are included in the alpha analysis.
If you want just the item analysis statistics printed without the correlation matrix or the usual summary statistics generated by PROC CORR, add the NOSIMPLE and NOCORR options to the PROC CORR line.

Back to Top

 


 

Missing F-ratios and P-values in SAS GLM ANOVA

Question:

I've run an ANOVA with SAS PROC GLM, but when I look at my output, all I get are missing values (period symbols) instead of F-ratios and p-values. Why is SAS doing this?

Answer:

One reason that SAS may not give you F-tests and the corresponding p-values is that you have specified a saturated model. For example, you may have asked SAS to estimate as many parameters as there are cells in your ANOVA. Thus you do not have a sufficient number of degrees of freedom to obtain the desired F-tests. The remedy is to specify an unsaturated model, i.e., by including fewer effects in the model.

Back to Top



 

Using SAS MACRO language

Question:

I'm writing a SAS program that I need to run many different times, with the values of some of the variables changing each time. Is there some SAS programming tool that will make this easier?

Answer:

Create SAS macro variables with the %LET statement. The syntax for the %LET statement is:
%LET macvname = value ;
where "macvname" is a variable name you specify and "value" is a numeric value you specify. This value will remain constant throughout the program. The %LET statement is usually placed at the beginning of the program for ease of access.
For example, suppose that you had a variable named N in your SAS program and the first time you ran the program you assigned it the value 24. However, you wish to rerun the SAS program with N=30, N=45, etc. Instead of changing the assignment statement (perhaps deep within your program) each time, you need only change the value given in the statement at the beginning of the program. However, the macro variable name (here N) defined in the %LET statement must now be preceded by an ampersand (&) in subsequent statements. For example:%LET
%LET Expected = 12 ;
Numer = Obtained - &Expected ;
Numersq = Numer**2 ;
CellChi = NumerSq / &Expected ;
RUN ;
Whenever SAS encounters the "&Expected" expression in this program, it will substitute the value of "Expected" that was defined in the %LET statement. In this example, SAS will substitute the user-supplied value of 12 every time it encounters the "&Expected" expresssion.

Back to Top

 


 

Jacknife regression using SAS

Question:

I would like to do a multiple regression using the jackknife procedure. That is, I would like to run N regressions, dropping one case each time, and I want the N sets of parameter estimates output to one SAS data set so I can analyze them.

Answer:

The following SAS program performs a jackknife regression. The parameter estimates are stored in a temporary SAS data set named RegEsts. The estimates are generated in a SAS Macro named JackReg. Each loop through JackReg drops one case and runs PROC REG on the remaining cases. Thus the number of loops depends on the number of cases in the data set.
The purpose of the first DATA step is to create a macro variable that contains the number of cases and thus is used to end the DO loop. The SAS function SYMPUT creates this variable (here named NCASES). Include the END= option on the INFILE statement to avoid calling SYMPUT once for each observation. (Note that the first DATA step can use the SET statement, instead of the INFILE and INPUT statements, if your data already exist in a SAS data set. In that case, use the END= option in the SET statement).
The heart of the program, the macro JackReg, is then defined and run.
/* SAS jackknife regression program.
Input your data in the preliminary DATA step and specify
your model in the MODEL statement within the macro.
*/
DATA one ;
INFILE ' yourraw dataset here' END=lastcase;
INPUT yvar xvar1 xvar2 xvarm ;
IF lastcase THEN CALL SYMPUT ('ncases', _N_) ;
RUN;
*Macro portion of program begins here;
%MACRO JackReg ;
%DO I = 1 %TO &ncases ;
DATA temp&I ;
SET one ;
IF _N_ NE &I ;
RUN;
PROC REG OUTEST = loopIest ;
Omits&I: MODEL yvar = xvar1 xvarm;
*Specify your model in the line above;
RUN ;
PROC APPEND BASE = RegEsts NEW = loopIest ;
RUN ;
%END ;
%MEND JackReg;
*Macro portion of program ends here;
%JackReg;*this statement actually runs the macro JackReg;
* End of jackknife regression program ;
If you have more questions about performing a jackknife regression using SAS, contact a consultant. The references for this program include: SAS Guide to Macro Processing, Version 6, Second Edition, pp. 65-70; p. 165; SAS/STAT User's Guide, Volume 2, Version 6, Fourth Edition, pp. 1351-1456; and SAS Procedures Guide, Version 6, Third Edition, pp. 43-52 (APPEND procedure).

Back to Top



 

Testing homogeneity of cell covariance matrices with SAS

Question:

How can I use SAS to test the homogeneity of the within-subject covariance matrices for the cells defined by the between-subject factors?

Answer:

Create a classification variable representing cell membership, and then use PROC DISCRIM to test for homogeneous cell covariance matrices.
For example, suppose that the following PROC GLM syntax had been written to perform a repeated measures ANOVA for a grouping factor with two levels, a treatment factor with three levels, and a single repeated factor with three levels (measurements).
PROC GLM DATA = repeated ;
CLASS group exertype ;
MODEL pulse1 pulse2 pulse3 = group exertype group*exertype ;
REPEATED repdfact 3 / PRINTE ;
RUN ;
The PRINTE option produces a sphericity test of the homogeneity of the covariance matrices of the orthogonal components of the transformed variables defined by the cells of the within-subject factors. To obtain a test of the homogeneity of the covariance matrices of the cells defined by all between-subject factors, study the following example.
DATA discrim ;
SET repeated ;
IF group = 1 AND exertype = 1 THEN intterm = 1 ;
IF group = 1 AND exertype = 2 THEN intterm = 2 ;
IF group = 1 AND exertype = 3 THEN intterm = 3 ;
IF group = 2 AND exertype = 1 THEN intterm = 4 ;
IF group = 2 AND exertype = 2 THEN intterm = 5 ;
IF group = 2 AND exertype = 3 THEN intterm = 6 ;
RUN;
PROC DISCRIM METHOD = NORMAL POOL = TEST ;
CLASS intterm ;
VAR pulse1 pulse2 pulse3 ;
RUN;
You will be interested in that part of the DISCRIM output that is labeled "Test of Homogeneity of Within Covariance Matrices". This output includes the chi-square value, degrees of freedom, and p-value produced by the test of the null hypothesis that the cell covariance matrices are homogeneous.
If you have a large number of between-subjects cells to create, consider using a single ARRAY statement rather than the multiple IF statements as shown above.
If you have more questions about this test, see the SAS/STAT User's Guide, Version 6, Fourth Edition, pp. 677-772. Example 3, and the output shown on p. 749, is particularly relevant. You can also click on the Help button in the SAS menu bar and scroll to SAS Help and Documentation for more information.
In general, you may want to consider using the MIXED procedure to conduct repeated measures ANOVA. MIXED features a wide array of covariance structures you can use to fit a more appropriate model to your particular dataset. You can also use MIXED to test sphericity and the homogeneity of covariance matrices. You can test sphericity by comparing the model fit criteria for a model with TYPE=HF (Huynh-Feldt) versus a model with covariance structure TYPE=UN (unstructured) specified on the REPEATED statement. Similarly, you can test homogeneity of covariance matrices across groups by testing a model with GROUP=groups*exertype versus a model with no GROUP= option on the REPEATED statement.

Back to Top

 


 

Repeated measures ANOVA with SAS PROC GLM

Question:

I would like to conduct a repeated measures ANOVA with five levels of a single repeated factor, and no between-subjects factors, using SAS.

Answer:

You can use PROC GLM. Since you have no between-subjects factors in your design, do not specify a CLASS statement, and do not specify any terms on the right-hand side of the MODEL statement.
The following example supposes that your five repeated measures variables have been labeled DV1 through DV5. The syntax for a within-subjects ANOVA on these five repeated-measures variables would be:
PROC GLM ;
MODEL dv1 dv2 dv3 dv4 dv5 = /NOUNI;
REPEATED reptdfac /PRINTE ; RUN ;
PROC GLM invokes SAS's general linear models procedure. The MODEL statement tells SAS to analyze the five repeated measures variables. Notice that the equal sign is included, but no terms are specified on its right-hand side. The NOUNI option tells SAS not to perform separate univariate tests on each dependent variable. Finally, the REPEATED statement tells SAS that the analysis is a repeated measures ANOVA. "Reptdfac" is a user-specified name for the single within-subjects factor in this analysis. The PRINTE option specifies that the assumption of sphericity be tested.
For more information, click on the Help button in the SAS menu bar and scroll to SAS Help and Documentation.
You may also perform repeated measures ANOVAs using PROC MIXED. MIXED allows you to fit a number of different covariance structures and perform analyses that PROC GLM cannot do appropriately. See a consultant for more details about PROC MIXED.

Back to Top

 


 

Hierarchical Regression Using SAS

Question:

I am trying to run a hierarchical regression using SAS. I have a dependent variable Y and four independent (predictor) variables X1 through X4. I want to enter X2 and X3 on the first step, and then enter X1 and X4 on the second step.

Answer:

There are several ways to do hierarchical regression using SAS. Perhaps the clearest approach is to use PROC REG and its TEST statement. Consider the following code.
PROC REG;
MODEL y = x2 x3;
TEST x3=0, x2=0;
MODEL y = x1 x2 x3 x4;
TEST x1=0, x4=0;
RUN ;
The first TEST statement produces a test of the null hypothesis that the predictor vectors x2 and x3 are both equal to zero. This is equivalent to the hypothesis that these two variables add no predictive ability to the model (in this case, to the null model consisting only of the grand mean). The second TEST statement tests whether the variables x1 and x4 add any predictive ability to the model containing x2 and x3 (and the grand mean). Each TEST statement must be interpreted relative to the MODEL statement preceding it.

Back to Top

 


 

Confidence intervals for cross-tabulated frequencies in SAS

Question:

Using a chi-square test, I've found that I've got dependence among rows and columns in a contingency table. I now want to establish confidence intervals for the frequencies. Can I do this in SAS?

Answer:

To construct the confidence interval for a parameter in a dependent model, use the PRED = FREQ option in the MODEL statement in PROC CATMOD to obtain the necessary standard error. (Multiplying the standard error by the appropriate z-value for the confidence interval gives you the half-width of the interval, which is centered at the predicted parameter value).
For example:
PROC CATMOD ;
WEIGHT w ;
MODEL a*b=_RESPONSE_ / PRED=FREQ;
LOGLIN a|b ;
RUN;
Here the variable W contains the number of observations for each combination of the categorical variables A and B. The expression "a*b=_RESPONSE" in the MODEL statement defines the model to be loglinear. The PRED=FREQ option requests information for both the obtained and expected frequencies. The LOGLIN statement defines the model. The syntax used here defines the model to be the full model (containing all main effects and interactions). The inclusion of the interaction term allows dependency to be modelled.
The standard error for each frequency will appear on the output in the PREDICTED VALUES table. Note that for a full (saturated) model, the observed and predicted frequencies are equal.

Back to Top

 


 

One-way ANOVA from summary statistics

Question:

Can I do a one-way ANOVA when I only have the summary stats (ns, means, and standard deviations of each group)? I'd like the SAS code for this.

Answer:

Yes, since the sufficient statistics for a one-way ANOVA are the means, standard deviations, and ns of each group, a one-way ANOVA is possible with only this information.
This topic was covered by David A. Larson in the May 1992 issue of the American Statistician (v. 46, pp. 151-152). He supplied the following SAS code. Note that you must replace the given values with your values: the first column contains the ns, the second column the means, and the third the standard deviations, of each group in the single factor.
The code generates a surrogate data set which produces the same output, including multiple comparison output, as the original data set. In the surrogate data set, each group will have n-1 identical values 'yis' (equal to the group mean plus the group standard error) and a final value 'yns' which forces the group mean and variance to the supplied values.
DATA surogate;
INPUT nj ybarj stdj;
yis = ybarj + sqrt((stdj**2)/nj);
yns = nj*ybarj - (nj-1)*yis;
group + 1;
y = yis; freq = nj-1; OUTPUT;
y = yns; freq = 1; OUTPUT;
CARDS; 
3 7.8333 .4041
5 8.3800 .4147
4 6.6250 .1708
3 6.9000 .4583
;
RUN;
PROC GLM DATA=surogate;
CLASS group;
FREQ freq; 
MODEL y = group;
MEANS group / TUKEY CLDIFF;
LSMEANS group / STDERR E; 
RUN;

Back to Top

 


Overlaid lines in SAS/GRAPH

Question:

How can I get four overlaid lines using SAS/GRAPH?

Answer:

To produce a graph with overlaid lines, use the following code:
PROC GPLOT DATA=mydata;
PLOT yvar*xvar=group;
RUN ;
In this case, yvar and xvar are the variables that will appear on the y-axis and x-axis of the graph. One line will appear for each value in the group variable.
The following options will make the plot easier to understand; the options must precede the PROC GPLOT statements in the SAS code.
GOPTIONS;
SYMBOL1 VALUE = dot 
COLOR =red
HEIGHT = .8 
INTERPOL =join ;
SYMBOL2 FONT =marker VALUE =C
COLOR =blue
HEIGHT = .8 
INTERPOL =join ;
SYMBOL3 FONT =marker VALUE =D
COLOR =green
HEIGHT = .8
INTERPOL =join ;
SYMBOL4 FONT =marker VALUE =M
COLOR =purple
HEIGHT = .8 
INTERPOL =join ;
The SYMBOL statement defines the characteristics of the symbols used to create different lines for each group. The VALUE option creates a different symbol for each group; the COLOR option makes each group's symbol and line a different color; the HEIGHT option defines the size of the symbols. The INTERPOL option joins the symbols to create a line for each group.
For more information on SAS/GRAPH and detailed examples, click on the Help button in the SAS menu bar and scroll to SAS Help and Documentation. Under the "Index" tab, type "SAS/GRAPH", then double-click on "SAS/GRAPH" in the scrolldown menu.

Back to Top

 


 

Superimposed SAS/GRAPH plot

Question:

I have a dataset that contains two grouping variables and two outcome variables. For each level of the first grouping variable, I want a plot of the two outcome variables across the levels of the second grouping variable. I want the plots to have two superimposed horizontal lines that indicate normal values for the lab test being plotted. I would like to use SAS on a UNIX platform and would like to print to a postscript laser printer. How can I do this?

Answer:

Here is an example:
LIBNAME test "~/test";
DATA test.new;
INPUT grp1 grp2 var1 var2;
CARDS;
1 1 2.3 3.2
1 2 1.9 2.5
1 3 2.2 3.5
2 1 2.8 4.0
2 2 2.2 3.6
2 3 1.5 3.8
3 1 3.2 4.4
3 3 2.3 3.9
4 1 1.6 3.2
4 2 1.8 2.3
;
RUN;
FILENAME grafout pipe 'lpr -Ptay_lw';
GOPTIONS RESET=global
DEVICE=applelw
GUNIT=in HTITLE=0.333 HTEXT=0.125 FTEXT=swiss
COLORS=(black)
HSIZE=10in
VSIZE=6.5in
ROTATE=landscape
GACCESS=sasgaedt
GPROLOG='25210d0a'x
GSFLEN=132
GSFMODE=replace
GSFNAME=grafout;
TITLE1 'Plot of Lab Values';
AXIS1 OFFSET=(5,5)pct;
SYMBOL1 VALUE=1 HEIGHT=.10;
SYMBOL2 VALUE=2 HEIGHT=.10;
SYMBOL3 VALUE=3 HEIGHT=.10;
PROC GPLOT DATA=test.new; 
PLOT var1*var2=grp2 /HAXIS=AXIS1 FRAME VREF=2.002 2.030;
BY GRP1;
RUN ;

Back to Top

 


 

PC SAS/GRAPH device names

Question:

What's the correct device name when using PC SAS/GRAPH?

Answer:

A device is the physical object (a monitor, a file, or a pen plotter), used to display your graph. Each different type of display device requires a different type of input to display a graph. SAS provides a large number of device drivers; however, you must know what SAS has named the device driver for your type of display.
SAS will display an alphabetized list of printer driver device names which you can use in your GOPTIONS statement if you run the following program:
PROC GDEVICE ;
RUN ;
Once you have identified the SAS name for your device, specify it in the GOPTIONS statement preceding your SAS/GRAPH procedure. The syntax is:
GOPTIONS DEVICE = devicename ;
One way to check that you have chosen the right device name is to run the GTESTIT procedure, which produces three test graphs if the graphics device has been properly specified. For example, suppose that you want to display information from SAS/GRAPH on your EGA monitor. You would submit the following code:
GOPTIONS DEVICE = EGA ;
PROC GTESTIT ;
RUN ;
Note that you must change the GDEVICE name when you switch from previewing your graph on screen to printing it since monitors and printers use different device drivers. Nonetheless, it is recommended that you preview your graph on screen before you print it.

Back to Top

 


 
 

Printing color output from SAS/GRAPH

Question:

How can I print my SAS/GRAPH output in color?

Answer:

The Computation Center Student Microcomputer Facility (SMF) located in FAC 212 has a color printer which can print postscript files.
To minimize the costs to you of color printing, before you create a color graph, first print it in black and white on a standard postscript laser printer. Add color options (e.g., the LEGEND, AXIS, and LABEL statements) only after you are sure that your graph looks right in black and white.
The following code will produce a color postscript file named "graph1" (this file name should meet the specifications of your system).
FILENAME post 'graph1' ;
GOPTIONS
DEVICE = devicename
GSFNAME = post ;
where "devicename" refers to the appropriate SAS/GRAPH device name for the color printer. Submit the file "graph1" to the SMF color printer using the LPR command with the printer queue name "facsmf_clw".

Back to Top

 


 

SAS test of marginal homogeneity

Question:

How do I generate the marginal homogeneity test for categorical repeated measures using SAS?

Answer:

You can use PROC CATMOD to test marginal homogeneity. In the following code, the variables "r" and "c" define the levels of one categorical variable measured at two time points, while the variable "w" contains the counts following into each cell of the resulting two-way table:
PROC CATMOD ;
WEIGHT w;
RESPONSE MARGINALS ;
MODEL r*c=_RESPONSE_ / FREQ ;
REPEATED time 2;
RUN;
The RESPONSE statement specifies the dependent measure MARGINALS as the random component of the model. The MODEL statement requests analysis of the variables as a main-effects plus interaction log-linear model. The REPEATED
statement requests that the margins be tested for equality.
For more help with this topic, click on the Help button in the SAS menu bar and scroll to SAS Help and Documentation. Open the SAS Products folder, then the SAS/STAT folder, then the SAS/STAT User's Guide folder, then the CATMOD procedure folder. Example 22.7 is particularly relevant.
In addition, you may want to consider using the GENMOD procedure to perform repeated measures analysis of categorical data. GENMOD allows you to fit different covariance structures among your repeated measures as well as use different link functions such as poisson and binomial to most appropriately test your hypotheses of interest.

Back to Top

 


 

Debugging SAS code

Question:

I have a huge SAS program that isn't working. The results I get are not right but there are no errors or warnings in the SAS log. How can I figure out where I went wrong?

Answer:

To debug a SAS program that produces no syntax errors, follow these six steps:
1. Check to see that your original data input is correct for all variables.
2. If the data is input to SAS correctly, go to the other end of the program. Select a variable or a small set of variables involved in the analyses where you get the wrong results. Use PROC FREQ, PROC MEANS, and/or PROC PRINT to examine these variables. There should be a problem with at least one; identify exactly how these variables are incorrect.
3. Now follow these variables back through each operation you performed, always looking at the characteristics in question. In this way you can narrow down the exact step where an error occurs. Prior to the questionable step, the variable characteristics will be appropriate; after the step they will be inappropriate.
4. Look carefully at the code for that step. Continue using PROC PRINT, PROC FREQ, and PROC MEANS to examine the effect of each statement. In this way, you can identify the exact statement or statement group that is not working as you expect.
5. Next, get a clear understanding of how the statement is working (as opposed to how you think it should work) by consulting the SAS Help function; click on the Help button in the SAS menu bar and scroll to SAS Help and Documentation; then, search for the particular statement or procedure. The results in hand should help you interpret the documentation.
6. Finally, determine the appropriate code for your needs. Remember to check for other statements that involve this mistake.

Back to Top



 

Mean substitution for missing values in SAS

Question:

How can I replace missing values with a mean value in SAS? Is there an easy way to do this for many variables?

Answer:

SAS has a procedure called PROC STANDARD that can be used to standardize some or all of the variables in a SAS data set to a given mean and/or standard deviation and produce a new SAS data set that contains the standardized values. In addition, there is a REPLACE option that substitutes all missing values with the variable mean. If the MEAN=mean-value option is also specified, missing values are set instead to the user-specified mean-value.
The following SAS code demonstrates the use of PROC STANDARD for mean substitution.
DATA raw ;
INPUT v1-v10 ;
CARDS;
1 1 1 1 1 . 1 1 1 1
2 2 2 . 2 . 2 2 2 2
3 3 3 3 3 3 . . 3 3
4 4 4 . . 4 4 4 4 4
5 5 5 5 5 5 5 5 . .;
PROC STANDARD DATA=raw OUT=stnd REPLACE PRINT;
VAR v1-v10;
RUN;
The following SAS code demonstrates another way of substituting mean values for missing values.
DATA raw ;
INPUT v1-v10;
CARDS;
1 1 1 1 1 . 1 1 1 1
2 2 2 . 2 . 2 2 2 2
3 3 3 3 3 3 . . 3 3
4 4 4 . . 4 4 4 4 4
5 5 5 5 5 5 5 5 . .
;
PROC MEANS NOPRINT;
VAR v1-v10;
OUTPUT OUT=meandat(DROP=_TYPE_ _FREQ_) MEAN=m1-m10;
RUN ;
PROC PRINT DATA=meandat;
RUN ;
DATA meansub (DROP=m1-m10 i);
IF _N_ = 1 THEN SET meandat;
SET raw;
ARRAY old(10) v1-v10;
ARRAY means(10) m1-m10;
DO i = 1 TO 10;
IF old(i) EQ . THEN old(i) = means(i);
END;
RUN;
In the first DATA step, the data set raw is created with 10 variables, v1 through v10. Notice that there are one or more missing values (periods) for each observation in the data records.
PROC MEANS is used to produce a new dataset meandat which has variables m1 through m10 holding the means for the variables v1 through v10. PROC PRINT is used to verify this.
The second DATA step performs the substitution, creating a final data set called meansub. It defines two arrays: old represents v1 through v10 and means represents m1 through m10. A DO loop moves through the array variables, checking each value of array old to see if it is missing. If it is missing, then the value is set to the corresponding value from the array means; this is the mean substitution.
For more information on handling missing data, please see General FAQ: Handling missing or incomplete data.

Back to Top

 


 
 

Placement of IF statements in SAS

Question:

I have done some IF statements in my SAS programs and then some PROCs. Now I want to do more IF statements, but SAS won't let me. Why not?

Answer:

SAS IF statements can only be used in a DATA step. Structure your program so that all IF statements are embedded in DATA steps. This may mean you need many DATA steps; if space is a problem, create temporary datasets. DATA steps begin with the DATA statement and require the input of data. An example follows.
DATA old ;
INFILE 'filename' ;
INPUT x y z ;
IF x = y THEN xy=x*y;
RUN;
PROC FREQ ;
RUN;
DATA new ;
SET old ; 
IF x GT z THEN newvar = y/x ;
RUN;
PROC PRINT ;
RUN;

Back to Top

 


 

SAS cell chi-square test

Question:

How can I get significance values for the chi-square values of individual cells in a contingency table, using SAS?

Answer:

To obtain the individual cell chi-square values, use PROC FREQ and include the /CHISQ CELLCHI2 options in the statement.TABLES
PROC FREQ ;
TABLES varnames*varnames /CHISQ CELLCHI2 ;
RUN;

Back to Top

 


 

Overlay of histogram with Normal probability plot using SAS

Question:

I want to plot a histogram for a variable in my dataset and overlay a normal distribution that has a mean and sd equal to the mean and sd of the variable being plotted. I'd like to do this plot and overlay to visually inspect how closely my variable approaches a normal distribution. I'd like to be able to plot the results on an Apple laserwriter, I'd like to be able to use a UNIX workstation, and I'd like to use SAS. Can I do this?

Answer:

You can use PROC CAPABILITY. PROC CAPABILITY is part of the SAS/QC Software. QC stands for Quality Control. The CAPABILITY procedure is described in the SAS Help documentation located under the Help button in SAS; it can be found by typing CAPABILITY procedure under the Index tab. In the Capability procedure, you will use the HISTOGRAM statement. The HISTOGRAM statement is also described in the SAS Help documentation. The following code generates the desired plot.
********************************;
LIBNAME test '~/data';
FILENAME grafout PIPE 'lpr -Ptay_lw';
GOPTIONS
RESET=GLOBAL
DEVICE=APPLELW
GUNIT=in
HTITLE=0.333
HTEXT=0.125 
FTEXT=TRIPLEXU
HPOS=80 VPOS=80
COLORS=(BLACK) 
HSIZE=10in
VSIZE=6.5in
ROTATE=LANDSCAPE
GACCESS=SASGAEDT
GPROLOG='25210d0a'x
GSFLEN=132
GSFMODE=REPLACE
GSFNAME=grafout;
TITLE1 'Primary Title';
TITLE2 'Secondary Title';
PROC CAPABILITY DATA=test.userdata GRAPHICS;
HISTOGRAM var1 / NORMAL;
RUN;
**********************************;
The code is written for SAS on any ITS UNIX system which has the QC module installed on it.
The LIBNAME statement points to a subdirectory called '~/data', where the data are stored.
The FILENAME statement assigns the psuedodevice grafout to the UNIX pipe 'lpr'. The printer is specified following the -P qualifier.
The printer type is chosen in the DEVICE option of the GOPTIONS statement. The font is selected in the FTEXT option of the GOPTIONS statement. The HPOS and VPOS options control the size of the graph image on the sheet of paper or on the screen. The larger the values for HPOS and VPOS, the smaller the resulting plotted image.
The specifications of the CAPABILITY procedure includes the GRAPHICS option and the HISTOGRAM statement with the NORMAL option. This will produce the desired graphics output.
Descriptions of the GOPTIONS options can be found in the SAS Help and Documentation in SAS by following this path under the Contents tab: SAS Products –> SAS/GRAPH –> SAS/GRAPH Reference –> SAS/GRAPH Concepts –> Graphics Options and Device Parameters Dictionary. The same information can also be found in the online SAS manual at http://support.sas.com/documentation/cdl/en/graphref/59607/HTML/default/gopdict-list.htm.
A full listing of available fonts can be found by using the SAS Help and Documentation in SAS and following this path under the Contents tab: SAS Products –> SAS/GRAPH –> SAS/GRAPH Reference –> SAS/GRAPH Concepts –> SAS/GRAPH Fonts –> Using SAS/GRAPH Software Fonts –> under the heading Font Lists. This information is also found in the online SAS manual at http://support.sas.com/documentation/cdl/en/graphref/59607/HTML/default/font-font-lists.htm.

Back to Top



 
 

Customizing PROC PRINT output in SAS

Question:

I want SAS PROC PRINT to list variable labels out in a horizontal format rather than variable names in a vertical format. How can I do this?

Answer:

Use the LABEL option in the PROC PRINT command, along with at least one LABEL statement below the PROC PRINT to print out variable labels. For example, consider the following PROC PRINT statement and its attendant subcommands:
PROC PRINT LABEL ;
LABEL varname1 = 'Label1' varname2='Label2' varname3='Label3';
VAR varname1 varname2 varname3;
RUN ;
Listing variable labels in horizontal format is the default, so this PROC PRINT procedure will list in horizontal format all variable labels (or just names for unlabeled variables) for the variables listed in the VAR statement. To print the variable labels in vertical format, the option HEADER=VERTICAL would have to be used as in the following example:
PROC PRINT LABEL HEADER=VERTICAL;

Back to Top

 


Generating random numbers with SAS

Question:

I need to generate a sample of random numbers from a normal distribution with a particular mean and standard deviation. How is this done with SAS?

Answer:

Here is how to generate a SAS dataset with the characteristics that you desire:
DATA random;
DO i = 1 TO 100;
randnum = 3.5 + .25 * RANNOR(1);
OUTPUT;
END; RUN;
where 3.5 will be the mean and .25 is the standard deviation. For other types of random distributions, click on the Help button in the SAS menu bar and scroll to SAS Help and Documentation; type random numbers under the Index tab.

Back to Top

 


 
 

Three dimensional plotting with SAS

Question:

I would like to generate a scatterplot of my data in three dimensions. Also, I have two groups that I would like the 3D plot to differentiate between. I'm using SAS. How can I do this?

Answer:

You can use SAS/GRAPH to create the plot of your data. Initially, you will define and specify color and shape variables, which will allow SAS to print unique colors and symbols for each of your two subgroups. You then invoke the G3D procedure to plot the data points against a three-dimensional axis background. The following sample SAS program illustrates these steps. In this example, the two subgroups are defined by the values of the variable "emotsit". DATA two is created from DATA one, which for this example is the dataset with your original data.
* Begin sample program ;
DATA two ;
LENGTH shapeval $8. ;
LENGTH colorval $8. ;
SET one ;
IF emotsit = 2 THEN DO ;
shapeval = 'star' ; 
colorval = 'vip'; 
END ;
IF emotsit = 1 THEN DO ;
shapeval = 'diamond' ;
colorval = 'blue' ; 
END ;
RUN ;
* Set graphic device options and save graph information to an external file ;
FILENAME post 'system_file_name';
GOPTIONS DEVICE = phaser
GSFNME= post
GSFMODE= replace noprompt
CBACK = white
CTEXT = black
FTEXT = zapf 
COLORS = (black blue vip)
RESET = global
GUNIT = pct noborder
;
PROC G3D ;
SCATTER dimens1 * dimens2 = dimens3
/ SHAPE = shapeval color = colorval
GRID caxis = black ctext = black ;
RUN ;
*End sample program ;
After setting the colors and symbols, one must also appropriately adjust the graphics environment by using the GOPTIONS statement. You should be aware that the parameters of this statement may change when you switch devices (e.g., when you switch from a monitor device to a printer device). Finally, you invoke the G3D procedure.

Back to Top

 


 
 

Displaying SAS output in Word

Question:

I want to print my SAS output using Word. However, when I save my output and then open it with Word, the formatting is not the same as it was in the SAS OUTPUT window. What can I do to make sure the format stays consistent between SAS and Word?

Answer:

SAS (in selected versions) comes with a global font called SAS Monospace. In Word, you would select the text you wish to format, and change its font to SAS Monospace. With this font, it displays and prints the same way as in the SAS Output window.

Back to Top

 


 

Changing SAS output delimiter

Question:

I'm having trouble determining when the output from one SAS procedure ends and another begins. I'd like to add a row of hyphen (-) indicators to help me separate out the outputs from each procedure. How can I do this using SAS?

Answer:

To accomplish this task, use the OPTIONS FORMDLIM option.
In your case, you would place an OPTIONS FORMDLIM ahead of your procedures in your SAS command file with the following syntax:
OPTIONS FORMDLIM = '-' ;
Your procedural output will now be separated by procedure by a blank line, a line consisting of hyphens, followed by another blank line.
You may also use this syntax to override the normal SAS page eject defaults and have SAS place output from multiple procedures on a single page of output.
If you wish to return your SAS output to its initial, default appearance, the default page eject for FORMDLIM is:
OPTIONS FORMDLIM = '' ;

Back to Top

 


 
 

Generating factorial designs in SAS

Question:

How can I generate some n by p (n levels, p factors) design matrices with orthogonal effect encoding using SAS?

Answer:

PROC FACTEX, part of SAS/QC, is available through the UT site license program.
For information on PROC FACTEX, click on the Help button in the SAS menu bar and scroll to SAS Help and Documentation. Click on the Contents tab, then on SAS Products and scroll down to SAS/QC. Click SAS/QC User's Guide, then the FACTEX procedure. This contains examples of how to use the program to create full or fractional factorial designs.

Back to Top

 


 

Formatting SAS data

Question:

How do I get sas to take the number :
0001
and to output it EXACTLY as it looks without stripping off the leftmost zeros?

Answer:

If you want SAS to be able to use this value as a number, use the Zw.d format, where "w" is the total width of the output field to be used including any decimal point, and "d" is the number of decimal places. This format will right justify the data in the field and pad any left side blanks with zeros.
For example:
DATA showform;
INPUT x ;
FORMAT x Z5.;
CARDS;
00001
2
000003
;
RUN ;
PROC PRINT;
RUN;
Since the "w" value specified was 5, the output is printed with as many leading zeros as are required to fill a five character wide output field. Thus the first value is printed exactly as it is input, the second value is printed with four new leading zeros, the third value has one less leading zero.
If you don't need this value treated as a number, input it as a character value -- in this case whatever form it has in input is stored for output. However, you will not be able to use this value in any numeric procedures or calculations.
For example:
DATA showform;
INPUT x $;
CARDS;
00001
;
RUN ;
PROC PRINT;
RUN;

Back to Top

 


 

Computing an Intraclass Correlation using SAS

Question:

I want to compute an intraclass correlation using SAS, but I can't find a procedure that will calculate it for me. What should I do?

Answer:

The SAS macro program, %INTRACC, located at http://support.sas.com/kb/25/031.html, will calculate six different types of intraclass correlation coefficients. The Details tab of the web page explains how to use the macro, describes the six types of ICC output, and provides references for the statistics used in the macro. The Results tab displays sample data and output using the %INTRACC macro. The Download tab allows you to download the macro and save it to a specified location, for use with your own data.

Back to Top

 


 
 

Counting if a condition is satisfied

Question:

I am analyzing stock market dividend payments by month. The problem is that while most dividend payments occur quarterly, this isn't always the case (I have a large dataset, so there are bound to be some oddball cases). I need to calculate an average dividend payment, spread out over the months, until another dividend payment occurs, and then start the process over again. If the time interval was equal between dividend payments, I'd use a DO loop, but since dividends aren't necessarily paid every three months for every stock in my dataset, I'm stuck. What should I do?

Answer:

You can use the SUM statement to create a grouping variable which indicates which dividend payment is in effect for each month in your dataset. Then, once you have this information, you can run a PROC MEANS broken down by group and then merge the means with the original dataset. Like so:
* Create a sample dataset ;
DATA one ;
INFILE CARDS ; 
INPUT month payment ; 
CARDS ; 
01 60 
02 0 
03 0 
04 0 
05 40
06 0 
;
* We create six months worth of
* data here, with one sixty dollar
* dividend payment in January and
* one forty dollar payment in May;
DATA two ;
SET one ;
IF payment NE 0 THEN group+1 ;
PROC SORT ;
BY group ;
PROC MEANS ; 
VAR payment ;
BY group ; 
OUTPUT OUT=mout MEAN=meanout ; 
DATA three ; 
MERGE mout two ;
BY group ;
PROC PRINT ; 
RUN ;
This program will display the mean of each group with the appropriate month. Notice that the SUM statement which generates the count for the GROUP variable is not incremented unless the condition (payment NE 0) is satisfied. The SUM statement is not the same as the SAS SUM function, which is used to add separate variables together.

Back to Top

 


 

Duplicating observations using SAS

Question:

I have a variable called COUNT in my dataset. I want to have the number of observations in SAS reflect that number. In other words, right now I have
OBS ORIGIN DEST MODE COMMODIT COUNT VALUE STATMOYR
1 TX DF 5 84 3 10000 595
2 TX ML 5 84 1 1000 595 
3 TX TM 6 85 2 2000 695
What I want to have SAS print out is:
OBS ORIGIN DEST MODE COMMODIT NEWCOUNT VALUE STATMOYR
1 TX DF 5 84 1 10000 595
2 TX DF 5 84 1 10000 595
3 TX DF 5 84 1 10000 595
4 TX ML 5 84 1 1000 595
5 TX TM 6 85 1 2000 695 
6 TX TM 6 85 1 2000 695
Can you tell me how to do this using SAS?

Answer:

The following SAS program demonstrates how to first read your data in one SAS DATA step and then, in the second DATA step, process the existing data to set the number of observations equal to the number of the COUNT variable. Notice that you could do all of this in one single DATA step; we split the data reading and data processing portions here for illustrative purposes.
The variable "newcount" is included so that each observation has a new count value of 1.
OPTIONS LS = 72 ;
DATA one ;
INFILE cards ;
INPUT origin $ dest $ mode commodit count value statmoyr ;
CARDS ;
TX DF 5 84 3 10000 595
TX ML 5 84 1 1000 595
TX TM 6 85 2 2000 695
;
RUN ;
PROC PRINT ;
RUN ;
DATA two ;
newcount = 1 ;
SET one ;
DO i = 1 TO count ;
OUTPUT ;
END ;
DROP i ;
RUN ;
PROC PRINT ;
VAR origin dest mode commodit newcount value statmoyr ; 
RUN ;

Back to Top



 
 

Testing equality of slopes for two repeated measures variables in SAS

Question:

I have Y variables, Y1 and Y2. These are functions of two independent variables, X1 and X2 in a repeated measures context. I want to see if the slope of the regression of X1 on Y1 is equal to the slope of the regression of X2 on Y2. If the null hypothesis of equal slopes is not rejected (i.e., the slopes are equal), I'd like to compare the Y-intercepts of each regression line. Can I do this using SPSS?

Answer:

Unfortunately, this type of specialized test is not available in SPSS at this time (March, 1997). However, SAS can perform both tests using the SYSLIN procedure with the ITSUR option.
You can read an SPSS portable data file into SAS with only a few lines of SAS code (this procedure is described elsewhere in the FAQ database). Once you have done this step, you can run PROC SYSLIN in SAS to test your hypotheses.
PROC SYSLIN ITSUR ;
model1: MODEL y1 = x1 ;
model2: MODEL y2 = x2 ;
STEST model1.x1 = model2.x2 ;
SRESTRICT model1.x1 = model2.x2 ;
STEST model1.intercept = model2.intercept ;
RUN ;
The first STEST statement tests the equality of the two regression slopes. The SRESTRICT statement then imposes the constraint of equal regression slopes onto the model. The second STEST statement then tests the equality of the two intercept values given the constraint that the two regression slopes are equal.

Back to Top

 


 

Numbering lines in the SAS program editor

Question:

Is there a way to automatically number SAS syntax lines and have those numbers visible in the Enhanced editor window?

Answer:

Yes. With the Editor window as the active window, the number lines option is found under the Tools menu as follows:

Tools
Options
Enhanced Editor
Click the General tab and then click Show line numbers.

Back to Top

 


 
 

Converting SAS multivariate repeated measures data to univariate format

Question:

I've run a repeated measures ANOVA using the SAS GLM procedure. My dataset has a single between-subjects grouping factor with two levels and I have four dependent variables that comprise my repeated measures effect. For follow-up analyses using the MIXED procedure, I need to rearrange my dataset so that there is a single dependent variable column and then a second column that refers to the measurement occasion of the dependent variable (e.g., 1, 2, 3, or 4). What's the best way for me to rearrange my dataset using SAS?

Answer:

There are a number of ways you can SAS to transform your data from multivariate to univariate form. Here is one approach you can use.
DATA one ;
INFILE cards ;
INPUT a b1 b2 b3 b4 ;
subjid+1 ;
CARDS ;
1 3 4 7 7
1 6 5 8 8
1 3 4 7 9
1 3 3 6 8
2 1 2 5 10
2 2 3 6 10
2 2 4 5 9
2 2 3 6 11
;
RUN ;
PROC SORT DATA = one ;
BY a subjid ;
RUN ;
PROC TRANSPOSE DATA=one OUT=two NAME=measure PREFIX=y_all_;
VAR b1-b4 ;
BY a subjid ;
RUN ;
This example features a between-subjects grouping factor "a" and four within-subjects dependent variables, b1 through b4. The SAS syntax shown above first creates a counter variable called "subjid" to denote subject ID number using the syntax SUBJID+1. The program then sorts the dataset by subject ID number within each level of group using the SORT
procedure.
The TRANSPOSE procedure then produces a single dependent variable, "y_all_1", a character variable called "measure" that indicates the measurement occasion of the dependent variable, and it retains the correct group specification "a".
Notice that the TRANSPOSE procedure appends a number to the dependent variable name, which is specified by the PREFIX keyword. The prefix "y_all_" is joined to the number "1" to create the new dependent variable, "y_all_1". The new single dependent variable, the measure variable, and the group variable "a" are written to the temporary SAS dataset work.two by the TRANSPOSE procedure.
For more information, click on the Help button in the SAS menu bar and scroll to SAS Help and Documentation.

Back to Top

 


 
 

Multilevel model ICC using PROC MIXED

Question:

I am using SAS PROC MIXED to perform a multilevel or hierarchical linear model (HLM) analysis. Before I start, I want to compute the intraclass correlation to assess the proportion of variance in my dependent variable that can be attributed to cluster membership. My dependent variable are written test scores while my cluster membership variable is learning skills center.

Answer:

This FAQ assumes you are familiar with the basics of multilevel models theory and PROC MIXED syntax.
You can fit what is known as the unconditional means model using PROC MIXED to obtain the intraclass correlation coefficient. More specifically, the following PROC MIXED syntax produces the necessary variance component estimates that make up the intraclass correlation; once you have the variance components it is a simple matter to calculate the ICC value by hand.
Consider the following PROC MIXED syntax:
PROC MIXED DATA = sci METHOD = ML COVTEST ;
CLASS center ;
MODEL written = / SOLUTION ;
RANDOM INT / TYPE = UN SUBJECT = center ;
TITLE 'Unconditional Means Model';
RUN ;
The PROC MIXED statement lists the SAS dataset used in the analysis, sci. It also uses the ML or maximum likelihood estimation method. The COVTEST option requests covariance parameter estimates and associated test statistics be printed on the output.
The CLASS statement tells PROC MIXED that "center" is a classification variable. The MODEL statement tells PROC MIXED that the dependent variable, "written", is a function of the intercept or grand mean estimate only. Like all SAS regression and general linear model procedures, PROC MIXED assumes the presence of an intercept in the model unless the user explicitly specifies an option (NOINT) telling PROC MIXED to fit a no intercept model. The SOLUTION option has PROC MIXED print out regression parameter estimates, standard error estimates, and associated test statistics in tabular form.
The RANDOM statement features the keyword INT, which tells PROC MIXED to estimate separate intercept values for each center. The TYPE = UN option tells PROC MIXED to use an unstructured covariance matrix for the random effects and the SUBJECT = center option tells PROC MIXED that the clustering variable is the learning center variable. Finally, the procedure ends with a TITLE statement and a RUN statement.
The relevant portion of the output from this analysis is shown below.
Covariance Parameter Estimates (MLE)
Cov Parm Subject Estimate Std Error Z Pr > |Z|
UN(1,1) CENTER 129.18742744 25.47622036 5.07 0.0001 
Residual 321.55581213 10.63125905 30.25 0.0001
The variance component for center is 129.19 while the variance component that is left over after variance due to center has been explained is 321.56. The intraclass correlation is thus:
variance due to the clustering variable / (variance due to the clustering variable + variance remaining)
For this example, the intraclass correlation is:
129.18742744 / (129.18742744 + 321.55581213) = .29
Roughly 29 percent of the total variance can be explained by cluster membership. This finding indicates that a multilevel or hierarchical model would provide substantial benefits over a standard fixed effects model for the analysis of these data.
Note that you could also compute an intraclass correlation coefficient when you had additional predictors included in the model, though its meaning would be different than the intraclass correlation coefficient shown in this example.
For more information on specifying models using PROC MIXED, examples, and interpretation of PROC MIXED output, see the SAS Institute publications Advanced General Linear Models with an Emphasis on Mixed Models, The SAS System for Mixed Models, and SAS/STAT Software: Changes and Enhancements through Release 6.12.
You can also review PROC MIXED syntax using the SAS System on-line help facility by clicking on the Help button, then scrolling to SAS Help and Documentation. Then enter the keyword MIXED PROCEDURE under the Index tab. SAS will then show a list of topics and options associated with the MIXED procedure; choose the most relevant topic and then click the DISPLAY button to view the contents of that topic area.
You can download a copy of the paper Using SAS PROC MIXED to fit multilevel models, hierarchical models, and individual growth models written by Judith Singer at Harvard University to learn more about using PROC MIXED to fit multilevel models to normally distributed outcome variables.

Back to Top

 


 
 

Fitting a multilevel model using PROC MIXED

Question:

I would like to use SAS PROC MIXED to perform a multilevel or hierarchical linear model (HLM) analysis. Before I start, I want to fit a regular fixed effects regression analysis where I regress my dependent variable onto my independent variable. My dependent variable is written test scores while my cluster membership variable is learning skills center. The indepdendent variable is teacher. How can I fit the fit the fixed effects model and then the random effects model that takes into account learning skills center variation using PROC MIXED?

Answer:

This FAQ assumes you are familiar with the basics of multilevel models theory and PROC MIXED syntax.
The following PROC MIXED syntax produces the fixed effects model.
PROC MIXED DATA = sci METHOD = ML COVTEST INFO IC UPDATE ;
CLASS center ;
MODEL written = teacher / SOLUTION ;
TITLE 'Fixed Effects Model';
RUN ;
The PROC MIXED statement lists the SAS dataset used in the analysis, "sci". It also uses the ML or maximum likelihood estimation method. The COVTEST option requests covariance parameter estimates and associated test statistics be printed on the output. The INFO and ICoptions request that SAS print additional model fitting and design information on the output. The UPDATE option tells SAS to write the results of each iteration step as notes in the LOG window or file as PROC MIXED completes that iteration. This information can be useful for diagnostic purposes when models converge very slowly, or not at all.
The CLASS statement tells PROC MIXED that "center" is a classification variable. The MODEL statement tells PROC MIXED that the dependent variable, "written", is a function of the intercept or grand mean estimate and the teacher variable. Like all SAS regression and general linear model procedures, PROC MIXED assumes the presence of an intercept in the model unless the user explicitly specifies an option (NOINT) telling PROC MIXED to fit a no intercept model. The SOLUTION option has PROC MIXED print out regression parameter estimates, standard error estimates, and associated test statistics in tabular form.
By contrast, the following PROC MIXED syntax produces the random effects model where teacher's are nested within the higher level variable, center. The syntax is the same as the previous fixed effects model syntax, with the exception of the inclusion of a RANDOM statement in this new analysis.
PROC MIXED DATA = sci METHOD = ML COVTEST INFO IC UPDATE ;
CLASS center ;
MODEL written = teacher / SOLUTION ;
RANDOM INT teacher / TYPE = UN SUBJECT = center ;
TITLE 'Mixed Effects Model';
RUN ;
The RANDOM statement features the keyword INT, which tells PROC MIXED to estimate separate intercept values for each center. The teacher variable is also included so that PROC MIXED will estimate separate slope deviations for each center from the grand slope estimated across all centers. The TYPE = UN option tells PROC MIXED to use an unstructured covariance matrix for the random effects and the SUBJECT = center option tells PROC MIXED that the clustering variable is the learning center variable. Finally, the procedure ends with a TITLE statement and a RUN statement.
For more information on specifying models using PROC MIXED, examples, and interpretation of PROC MIXED output, see the SAS Institute publications Advanced General Linear Models with an Emphasis on Mixed Models, The SAS System for Mixed Models, and SAS/STAT Software: Changes and Enhancements through Release 6.12.
You can also review PROC MIXED syntax using the SAS System on-line help facility by clicking on the Help button, then scrolling to SAS Help and Documentation. Then enter the keyword MIXED PROCEDURE under the Index tab. SAS will then show a list of topics and options associated with the MIXED procedure; choose the most relevant topic and then click the DISPLAY button to view the contents of that topic area.
You can download a copy of the paper Using SAS PROC MIXED to fit multilevel models, hierarchical models, and individual growth models written by Judith Singer at Harvard University to learn more about using PROC MIXED to fit multilevel models to normally distributed outcome variables.

Back to Top

 


 

Fitting a three level multilevel model using PROC MIXED

Question:

I would like to use SAS PROC MIXED to perform a multilevel or hierarchical linear model (HLM) analysis. I want to fit a three level model where the first level is student or individual research participant, the second level is classroom, and the third level is school. My dependent variable is math test scores while my cluster membership variables are classroom ID and learning skills center. The indepdendent variable is the English language test score for each student. How can I fit a three level model that takes into account classroom and learning skills center variation using PROC MIXED?

Answer:

This FAQ assumes you are familiar with the basics of multilevel models theory and PROC MIXED syntax.
The following PROC MIXED syntax produces the three level model.
PROC MIXED DATA = sasdata METHOD = ML COVTEST ;
CLASS school classrm ;
MODEL math = english / SOLUTION ;
RANDOM INT / TYPE = UN SUBJECT = school ;
RANDOM INT / TYPE = UN SUBJECT = classrm(school) ;
TITLE 'Three-level Junior School Project model';
RUN ;
The PROC MIXED statement lists the SAS dataset used in the analysis, sasdata. It also uses the ML or maximum likelihood estimation method. The COVTEST option requests covariance parameter estimates and associated test statistics be printed on the output.
The CLASS statement tells PROC MIXED that school and classrm are classification variables. The MODEL statement tells PROC MIXED that the dependent variable, math, is a function of the intercept or grand mean estimate and the English test score variable. Like all SAS regression and general linear model procedures, PROC MIXED assumes the presence of an intercept in the model unless the user explicitly specifies an option (NOINT) telling PROC MIXED to fit a no intercept model. The SOLUTION option has PROC MIXED print out regression parameter estimates, standard error estimates, and associated test statistics in tabular form.
There are two RANDOM statements shown in the PROC MIXED syntax. The first RANDOM statement features the keyword INT, which tells PROC MIXED to estimate separate intercept values for each classroom and school. The TYPE = UN option tells PROC MIXED to use an unstructured covariance matrix for the random effects and the SUBJECT = school option tells PROC MIXED that the clustering variable is the school. You may also include a SOLUTION option on the the RANDOM statement to obtain parameter estimates for the individual classrooms and schools.
The second RANDOM statement is identical to the first, except that instead of using school as the clustering variable as is the case in the first random statement we now use classrm(school) as the clustering variable. SAS interprets classrm(school) as "classroom within school". With the inclusion of both RANDOM statements the PROC MIXED syntax now estimates variances for intercepts at the school level and the classroom within school level, as well as the covariances between these random parameter estimates. These statistics represent the amount of variance attributable to school and classroom membership, and the relationships between schools and classrooms intercepts.
Finally, the procedure ends with a TITLE statement and a RUN statement.
For more information on specifying models using PROC MIXED, examples, and interpretation of PROC MIXED output, see the SAS Institute publications Advanced General Linear Models with an Emphasis on Mixed Models, The SAS System for Mixed Models, and SAS/STAT Software: Changes and Enhancements through Release 6.12.
You can also review PROC MIXED syntax using the SAS System on-line help facility by clicking on the Help button, then scrolling to SAS Help and Documentation. Then enter the keyword MIXED PROCEDURE under the Index tab. SAS will then show a list of topics and options associated with the MIXED procedure; choose the most relevant topic and then click the DISPLAY button to view the contents of that topic area.
You can download a copy of the paper Using SAS PROC MIXED to fit multilevel models, hierarchical models, and individual growth models written by Judith Singer at Harvard University to learn more about using PROC MIXED to fit multilevel models to normally distributed outcome variables.

Back to Top

 


 

Simple versus weighted kappa from SAS PROC FREQ

Question:

I've compared the judgments of two raters using PROC FREQ in SAS. The print out includes both a simple and a weighted Kappa. It doesn't appear that one is simply more conservative than the other. Can you explain the differences between these measures?

Answer:

This FAQ assumes you are familiar with the basics of PROC FREQ syntax.
The following test dataset is provided as an illustration. In this dataset, there are two raters and three values for the scale. The PROC FREQ statement uses the AGREE option to produce the Kappa estimates. In general, the weighted Kappa is used if you want to weight the close misses (e.g., 1-2) more heavily than misses that are further apart (e.g., 1-3). SAS will calculated the distance using a formula based on the number of categories in each column. The weighted Kappa makes sense if the rating scale is on an ordinal scale where a 1 point difference could be considered less of an incorrect rating than a 2 point difference. Although typically the weighted Kappa is a higher estimate than Simple Kappa, this is not necessarily true if there are more larger differences (e.g., 1-3) than smaller differences (e.g., 1-2). Weighted kappa is typically not appropriate for purely categorical variables where there is no ordering of the values.
**Creating test data set for illustration purposes**; 
DATA test ; 
INPUT rater1 rater2 count; 
DATALINES; 
1 1 5 
1 2 2 
1 3 1 
2 1 1 
2 2 7 
2 3 2 
3 1 1 
3 2 2 
3 3 9 
;
PROC FREQ DATA = test ;
WEIGHT count;
TABLE rater1*rater2 / AGREE ;
TITLE1 'Agreement statistics with weighted data';
RUN;
The output from running this syntax is shown below. In this case, assuming that the ratings are on an ordinal scale, we would expect the weighted kappa to be a better estimate (if we wish to take into account how close the misses were) since there are more 1 point differences (1-2 and 2-3) than 2 point differences (1-3).
Kappa Statistics
Statistic Value ASE 95% Confidence Bounds
------------------------------------------------------------
Simple Kappa 0.5424 0.1271 0.2932 0.7915
Weighted Kappa 0.5714 0.1295 0.3176 0.8253
Sample Size = 30
So, the answer to which one is best depends on the type of ratings made (categorical or ordinal) and whether you want to weight the misses differentially. Please let us know if you have any other questions.
For more information, click on the Help button in the SAS menu bar and scroll to SAS Help and Documentation.

Back to Top

 


 

Plotting regression lines of best fit for multiple groups using SAS

Question:

I have just run an analysis of covariance design using SAS PROC GLM where I included a group by covariate interaction. My dependent variable is called y, my covariate is called x1, and my grouping variable is called group. My GLM syntax is thus
PROC GLM DATA = sas data set name ;
CLASS group ;
MODEL y = cov group cov*group ;
RUN ;
My printed output shows a statistically significant group by covariate interaction effect. It's been suggested that one way I can interpret this interaction is to treat the situation as a regression problem and plot the regression of the covariate on the dependent variable, allowing separate lines of best fit for each of my groups. How can I produce such a plot using SAS?

Answer:

You can use SAS/GRAPH to produce the plot. Sample SAS code is shown below. The key is using a SYMBOL statement for each group in order to assign a different symbol and color for each group. The example assumes you have three groups. The I in the SYMBOL statement refers to the type of interpolation between data points; in this instance the RL or regression line interpolation function is specified.
**Create format to help the legend identify the groups;
proc format;
value fgroup 1 = 'Low Group'
2 = 'Medium Group'
3 = 'High Group';
**Create test data set;
data test ;
do group = 1 to 3 ;
do subject = 1 to 100 ;
y = rannor(0) + 2*group ;
x1 = rannor(0) + .3*y ;
output;
end;
end;
run;
**Creating the different symbols;
symbol1 value = circle color = red i = rl ;
symbol2 value = square color = blue i = rl ;
symbol3 value = star color = green i = rl ;
**Creating the plot itself and assigning the format;
proc gplot data = test ;
format group fgroup. ;
label y = 'Dependent Measure';
label x1 = 'Covariate';
plot y*x1 = group ;
run;
quit;

Back to Top

 


 


Rotating Axis labels in PROC GPLOT

Question:

How do I rotate the tick label (like placing the label in 45 degree angle) in PROC GPLOT?

Answer:

This answer assumes that you want to rotate the entire label and not just the letters within the label. The enclosed SAS code below that does this. The key is to define an AXIS statement with the appropriate option (ANGLE = in this case) and then refer to the axis in the PLOT statement using an HAXIS = option.
**Create test data set;
data test ;
do time = 1 to 4 ;
do subj= 1 to 10 ;
y = rannor(0) + time ;
output;
end;
end;
run;
**Create a format to apply to the x axis (time);
proc format ;
value ftime 1='Time 1'
2='Time 2'
3='Time 3'
4='Time 4';
run;
**Plot a regression line through data;
symbol1 color=blue value=star i= rl ;
**Tell SAS to rotate entire word--you can substitute
'rotate' instead of angle if you just want to rotate
the letters instead of the entire word;
axis1 value=(angle=45);
**Specify gplot and the 'haxis' option to tell gplot
to use axis1--you could also specify an axis2 statement
and assign it to the y axis using a 'vaxis=' statement;
proc gplot data = test ;
format time ftime. ;
plot y*time / haxis = axis1 ;
run;
quit;

Back to Top

 


 

Custom within-subjects contrasts using PROC GLM

Question:

I'd like to perform some custom within-subjects contrasts. I have one between-subjects factor, agecat3, with three levels, and two within-subjects factors, feedback (two levels) and task type (three levels). I want to compare the first two levels of task type collapsing across all other variables. I'm using PROC GLM in SAS. Should I use the CONTRAST statement to set up my contrast of interest? Here's my current PROC GLM syntax.
PROC GLM DATA = sasuser.prednes ;
CLASS agecat3 ;
MODEL p_san p_sfn p_dafn p_sapf p_sfp p_dafp = agecat3 / NOUNI;
CONTRAST 'sa vs sf tasks' ? ? ? ? ? ?..... ;
REPEATED feedback 2 , tasktype 3 / PRINTE SUMMARY ;
RUN ;
QUIT ;

Answer:

The GLM procedure divides between and within-subjects effects into separate contrast matrices. This means that the CONTRAST statement can only be used to specify and test between-subjects effects. To specify and test custom within-subjects effects, you should use the MANOVA statement. The following SAS PROC GLM syntax illustrates the use of the MANOVA statement, first to replicate the omnibus multivariate hypothesis tests reported by PROC GLM and then to fit your custom hypothesis test.
PROC GLM DATA = data ;
CLASS agecat3 ;
MODEL p_san p_sfn p_dafn p_sapf p_sfp p_dafp = agecat3 / NOUNI;
MANOVA H=INTERCEPT M=(-1 0 1 -1 0 1, /* Tasktype Main Effect */
0 -1 1 0 -1 1);
MANOVA H=INTERCEPT M=(1 1 1 -1 -1 -1); /* Feedback Main Effect */
MANOVA H=INTERCEPT M=(-1 0 1 1 0 -1, /* Tasktype*Feedback Interaction */
0 -1 1 0 1 -1);
MANOVA H=AGECAT3 M=(-1 0 1 -1 0 1, /* Agecat3*Tasktype Interaction */
0 -1 1 0 -1 1);
MANOVA H=AGECAT3 M=(1 1 1 -1 -1 -1); /* Agecat3*Feedback Interaction */
MANOVA H=AGECAT3 M=(-1 0 1 1 0 -1, /* Agecat3*Tasktype*Feedback Interaction */
0 -1 1 0 1 -1);
MANOVA H=INTERCEPT M=(-1 1 0 -1 1 0); /* SA versus SF*/
REPEATED feedback 2 , tasktype 3 / PRINTE SUMMARY ;
RUN ;
QUIT ;
The MANOVA statement contains an H= term that specifies the name of the between-subjects effect(s) of interest. If you want to collapse across between-subjects effects (i.e., not consider their interactive effect on the within-subjects contrast you define), use the SAS keyword INTERCEPT for this statement.
The next term is the M= specification. You specify the within-subjects contrast of interest using this term. For instance, in the PROC GLM syntax shown above, the first MANOVA statement replicates the task type main effect. The first row of the contrast matrix compares p_san + p_sapf against p_dafn + p_dafp. The second row of the matrix compares p_sfn + p_sfp against p_dafn + p_dafp. Together these two contrast rows form a two degree of freedom main effect contrast that replicates the omnibus multivariate repeated measures test of the task type main effect. The next two MANOVA statements replicate the feedback main effect and the task type by feedback interaction omnibus tests, respectively.
Notice that the fourth MANOVA statement is identical to the first, except that in the fourth statement the keyword INTERCEPT has been replaced by agecat3. Instead of testing the within-subjects main effect of task type collapsed across all three age categories, as is the case in the first MANOVA contrast, this contrast incorporates age category into the model as a predictor, so the resulting test captures the interaction between age category membership and task type level.
The last MANOVA statement performs the specified contrast of interest. In this contrast the two sa variables are compared to the two sf variables. Since you want to see if this effect is present across all age categories simultaneously, the keyword INTERCEPT is specified for the H= term.
The output will appear in multivariate test statistic form. This is a nice feature of PROC GLM because the multivariate tests are not sensitive to violations of the repeated measures ANOVA sphericity assumption.

Back to Top

 


 
 

Decomposing interactions using SAS

Question:

I have obtained a significant interaction effect using PROC GLM or another similar procedure in SAS and now I want to further decompose the interaction. In other words, I want to try to identify significant differences between levels of one variable within each level of the other variable or variables. My design has a two level between-subject factor in which one level represents an experimental group and the other represents a control group. There is also a repeated measures variable that is measured on four separate measurement occasions. Thus, the design is a 2 (experimental vs. control group) by 4 (measurement occasion: trial 1 vs. trial 2 vs. trial 3 vs. trial 4). After obtaining a significant interaction between the experimental condition and the measurement occasions, I want to know within which measurement occasions there were differences between experimental groups and I want to know which measurement occasions differed from each other within each level of the experimental groups.

Answer:

A typical method used to address your question is called the analysis of simple main effects (Winer, Brown, & Michaels, 1971). While you can obtain significant interaction effects using several different procedures in SAS such as PROC GLM and PROC ANOVA, due to the presence of a within-subjects effect in the example, you will need to use PROC MIXED procedure to obtain simple main effects. If your design contains only between-subjects effects, you may use the LSMEANS statement described below in PROC GLM to generate tests of simple main effects. If you used the PROC GLM or PROC ANOVA procedures to conduct omnibus tests, you will first have to transform your data from a multivariate to a univariate format. That is, in multivariate form, your data are arranged so that there is one row per subject and each row contains data for all measurement occasions of the repeated measures variable. To analyze your data in PROC MIXED, you should transform your data into a univariate format in which there is one row per measurement occasion. Thus, if you have four measurement occasions, you will need four rows per subject. For details on how to convert multivariate data to a univariate format, examine SAS FAQ 75: Converting SAS multivariate repeated measures to univariate format.
The syntax shown below illustrates the use of PROC MIXED to obtain contrasts between levels of each variable within all other levels of each of the other variables in an interaction. In this example, there are four levels of the repeated measures variable, trial and two levels of the between-subjects variable, anxiety. The variable, ID represents a subject identification number that is associated with each participant in the study. The variable y1 is the dependent variable representing participants' scores on the experimental task.
Each of the classification variables, trial, anxiety, and ID variables is defined in the CLASS statement. The line that follows defines the model which requests main effects for anxiety and trial and the interaction effect between these variables. The next line defines the classification variable, trial as a repeated measure variable in the REPEATED trial statement. ID is defined as the variable that indicates which rows of data are associated with an individual subject by the /SUBJECT = ID statement. The last statement on that line, TYPE = UN, is a required statement that indicates the type of covariance matrix used in the model. The line that reads, LSMEANS anxiety*trial /PDIFF, is the line that is used to request the contrasts between levels of the variables in the interaction. Following that line, there are two additional LSMEANS statements that are used to obtain simple main effects tests. The first simple main effect, obtained from the line, LSMEANS anxiety*trial / slice = anxiety, is a three degree of freedom test of the hypothesis that the four trials are equal within each level of the between subjects factor, anxiety. For example, one of the tests that is obtained from the above statement is a test of the hypothesis that trial1 = trial2 = trial3 = trial4 within the experimental condition of the anxiety variable. The final instance of the LSMEANS statement, LSMEANS anxiety*trial / slice = trial, compares each level of the between subjects factor, anxiety, within each level of the within subjects factor, trial. For example, one of the the hypotheses that will be tested by this statement is the contrast between the two levels of the anxiety variable, the control and experimental groups, within trial1.
PROC MIXED DATA = one;
CLASS anxiety trial ID ;
MODEL y1 = anxiety trial anxiety*trial ;
REPEATED trial /SUBJECT = ID TYPE = UN ;
LSMEANS anxiety*trial / slice = anxiety ;
LSMEANS anxiety*trial / slice = trial ;
LSMEANS anxiety*trial /PDIFF;
RUN;
Running the above syntax will produce several pieces of output including: interation history, convergence status, and fit statistics, as well as simple main effects and contrasts between the levels of the variables within each level of the other variables in the interaction. A logical place to begin examining your output is with the tests of simple main effects, which appear at the end of the output derived from the sample syntax above. This output is produced by the slice option in the LSMEANS statement and appears in the following table:
Tests of Effect Slices
Effect
Trial
Anxiety
Num
DF
Den
DF
F Value
Pr>F
ANXIETY*trial

1
3
10
47.97
<.0001
ANXIETY*trial

2
3
10
36.16
<.0001
ANXIETY*trial
TRIAL1

1
10
0.29
0.5008
ANXIETY*trial
TRIAL2

1
10
0.48
0.5025
ANXIETY*trial
TRIAL3

1
10
0.01
0.9115
ANXIETY*trial
TRIAL4

1
10
1.85
0.2038
The first two lines of the above table are tests of the hypothesis that all of the repeated measurements are equal within each level of the between subjects variable. For example, for the test of the hypothesis that trial1 = trial2 = trial3 = trial4 within level 1 of anxiety, an F value of 47.97 and a significance level of < .0001 is obtained, indicating that it is very unlikely that the four trials are equal to each other. The last four lines of the table test the hypothesis that levels of the between subjects factor, anxiety, are equal at each trial. For example, the comparison between levels of the between-subjects factor within trial1, produces an F value of .29 and a significance level of .6008, indicating that it is unlikely that there is a a difference between levels of anxiety within trial1.
After examining the Tests of Effect Slices table, you can examine contrasts between levels of variables within levels of other variables. The contrasts are in a table labeled, Differences of Least Squares Means and are similar to the one shown below:
Differences of Least Squares Means
Trial
Anxiety
Trial
Anxiety
Estimate
StandardError
DF
t Value
Pr > |t|
TRIAL1
1
TRIAL2
1
5.1667
0.9804
10
5.27
0.0004
TRIAL1
1
TRIAL3
1
8.3333
1.1702
10
7.12
<.0001
TRIAL1
1
TRIAL4
1
13.0000
1.3006
10
10.00
<.0001
TRIAL1
1
TRIAL1
2
-0.6667
1.2338
10
-0.54
0.6008
TRIAL1
1
TRIAL2
2
4.1667
1.3396
10
3.11
0.0111
TRIAL1
1
TRIAL3
2
8.5000
1.3530
10
6.28
<.0001
TRIAL1
1
TRIAL4
2
10.8333
1.4250
10
7.60
<.0001
TRIAL2
1
TRIAL3
1
3.1667
0.5798
10
5.46
0.0003
TRIAL2
1
TRIAL4
1
7.8333
0.6852
10
11.43
<.0001
TRIAL2
1
TRIAL1
2
-5.8333
1.3396
10
-4.35
0.0014
TRIAL2
1
TRIAL2
2
-1.0000
1.4376
10
-0.70
0.5025
TRIAL2
1
TRIAL3
2
3.3333
1.4501
10
2.30
0.0444
TRIAL2
1
TRIAL4
2
5.6667
1.5175
10
3.73
0.0039
TRIAL3
1
TRIAL4
1
4.6667
0.5578
10
8.37
<.0001
TRIAL3
1
TRIAL1
2
-9.0000
1.3530
10
-6.65
<.0001
TRIAL3
1
TRIAL2
2
-4.1667
1.4501
10
-2.87
0.0166
TRIAL3
1
TRIAL3
2
0.1667
1.4625
10
0.11
0.9115
TRIAL3
1
TRIAL4
2
2.5000
1.5293
10
1.63
0.1332
TRIAL4
1
TRIAL1
2
-13.6667
1.4250
10
-9.59
<.0001
TRIAL4
1
TRIAL2
2
-8.8333
1.5175
10
-5.82
0.0002
TRIAL4
1
TRIAL3
2
-4.5000
1.5293
10
-2.94
0.0147
TRIAL4
1
TRIAL4
2
-2.1667
1.5934
10
-1.36
0.2038
TRIAL1
2
TRIAL2
2
4.8333
0.9804
10
4.93
0.0006
TRIAL1
2
TRIAL3
2
9.1667
1.1702
10
7.83
<.0001
TRIAL1
2
TRIAL4
2
11.5000
1.3006
10
8.84
<.0001
TRIAL2
2
TRIAL3
2
4.3333
0.5798
10
7.47
<.0001
TRIAL2
2
TRIAL4
2
6.6667
0.6852
10
9.73
<.0001
TRIAL3
2
TRIAL4
2
2.3333
0.5578
10
4.18
0.0019
There are four columns in the above table that you use to determine which pairs of means are being compared. These are the two columns labeled, Trial and the two columns labeled Anxiety. Using these columns, you can find a particular comparison in which you are interested. For example, to examine the contrast between trial1 and trial4 in level 1 of Anxiety, you would find the row where both Anxiety columns contained the value 1 and where one of the Trial columns contained trial1 and the other contained trial4. This corresponds to the third row in the table above. Examining this row, the t value of 10.0 and its associated significance level of .0001 indicates that there is a significant difference between these two trials within level 1 of Anxiety. In addition to comparing the effects of repeated measures within a level of the experimental condition, we can also examine whether there is a difference between level 1 and level 2 of the between subjects variable, Anxiety. For example, you could examine the difference between levels of Anxiety within trial1 in the same manner as we approached the repeated measures comparison. First, you would find the row where both Trial columns contained trial1. Next, find the row where one of the Anxiety columns contains the value 1 and the other contains the value 2. This comparison can be found on the fourth row. Examining this row, you can see that there is not a significant difference between the two levels of Anxiety within trial1 as the t value of -.54 and its associated significance level of .6008 indicate that it is unlikely that there is a difference between these two groups.
For more information on specifying models using PROC MIXED, examples, and interpretation of PROC MIXED output, see the SAS Institute publications Advanced General Linear Models with an Emphasis on Mixed Models, The SAS System for Mixed Models, and SAS/STAT Software: Changes and Enhancements through Release 6.12. For more information about simple main effects tests see Statistical Principles in Experimental Design, by B.J. Winer, Donald R. Brown, and Kenneth M. Michaels. You can also click on the Help button in the SAS menu bar and scroll to SAS Help and Documentation for additional information on SAS procedures.

Back to Top

 


 

Post-hoc tests in a crosstabulation table using SAS

Question:

I have performed a two column by four row crosstabulation analysis using SAS PROC FREQ. I obtained a significant chi-square value for the overall test of the independence of rows and columns. I would now like to see where those differences occur using a method similar to a post-hoc analysis in ANOVA? Can this be done and, if so, how do I do it?

Answer:

According to SAS Technical Support, you can do this using the GENMOD and MULTTEST procedures. The sample program below, courtesy of SAS Technical Support, inputs the frequency values of cancer for each type of cancer and the location on the body where the cancer was detected.
The PROC GENMOD computes all possible pairwise comparisons among the cells. The PROC MULTTEST then provides Bonferroni-adjusted probability values as well as the original, unadjusted probability values for the paired comparisons.
Running this self-contained program and examining the output should prove helpful to understanding how the program works.
/* Multiple Comparisons on a Contingency Table Using */
/* LSMEANS statement in GENMOD */
/* After all pairwise adjustments are made, p-values */
/* are output to MULTTEST for Adjustment */
DATA melanoma;
INPUT type $ site $ count;
CARDS;
Hutchinson's Head&Neck 22
Hutchinson's Trunk 2
Hutchinson's Extremities 10
Superficial Head&Neck 16
Superficial Trunk 54
Superficial Extremities 115
Nodular Head&Neck 19
Nodular Trunk 33
Nodular Extremities 73
Indeterminate Head&Neck 11
Indeterminate Trunk 17
Indeterminate Extremities 28
;
RUN ;
** Run original crosstabulation analysis ;
PROC FREQ DATA = melanoma;
WEIGHT count;
TABLES type*site/chisq;
TITLE1' Original Table';
RUN;
** Run PROC GENMOD to obtain unadjusted pairwise comparisons ;
PROC GENMOD DATA = melanoma;
CLASS type site;
MODEL count=type site/d=POISSON LINK=LOG TYPE3 WALD;
LSMEANS type / DIFF;
ODS OUTPUT lsmeandiffs=p_vals;
TITLE1 'Log-Linear Model to get Multiple Comparisons';
TITLE2 'All pairwise comparisons for TYPE';
RUN;
** Rename the unadjusted chi-square p-values as "raw_p" so that **
** the distinction between unadjusted and adjusted p-values is clear **
** on the output ;
DATA p_vals;
SET p_vals;
RENAME probchisq=raw_p ;
RUN;
** Run PROC MULTTEST to obtain adjusted p-values as "bon_P" ;
PROC MULTTEST PDATA=p_vals BON OUT=adjust;
RUN;
** Print resulting unadjusted and adjusted p-values ;
PROC PRINT DATA = adjust;
TITLE1 'Multiple Comparison P-values with Bonferroni Adjustment';
FORMAT bon_p pvalue6.3;
RUN ;
** End sample program ;

Back to Top

 


 

Printing RTF files in landscape mode

Question:

I would like to print a RTF (Rich Text Format) file in landscape (horizontal) orientation on my printer. How can I do this using SAS?

Answer:

You can use a combination of Output Delivery System statements to accomplish this goal. Suppose you want to display the results of a PROC FREQ in landscape mode. You first specify the following OPTIONS statement:
OPTIONS ORIENTATION = landscape nonumber nodate ;
The landscape keyword tells SAS to write the RTF output in landscape (horizontal) format. The nonumber and nodate
options tell SAS to omit the printing of the date and page numbers on the output.
Next, specify the statement:
ODS RTF FILE = "my_rtf_file.rtf" style=styles.rtf ;
where my_rtf_file.rtf refers to the name and directory location of the RTF file you want SAS to create. Write your PROC FREQ syntax. Be sure to end the PROC FREQ with a RUN ; statement.
After the PROC FREQ finishes, you can reset the display options back to the default portrait format by specifying the statement:
OPTIONS ORIENTATION = portrait ;
Finally, do not forget to include the ODS RTF CLOSE ; statement following the PROC FREQ or other SAS procedures you are using to generate RTF output.

Back to Top

 


 
 

Plotting multilevel model data

Question:

I would like to plot regression lines of fit for a fixed effects model, a random intercepts model, a random slopes model, and a random coefficients model that contains both random intercepts and random slopes. My outcome is the score on a written achievement test; my predictor is the amount of teacher training in writing skills. I have students nested within 73 writing centers. How can I do this using SAS?

Answer:

You can PROC MIXED to fit the multilevel models and have it create output SAS tables for each of the four fitted models. Then you can use the GPLOT procedure in SAS/GRAPH to fit regression lines for each writing center, subject to the assumptions contained in each model (e.g., random intercepts, but not random slopes). The following SAS syntax illustrates how this is done.
PROC MIXED DATA = schools ;
CLASS center ;
MODEL written = teacher / SOLUTION OUTPRED = outf ;
TITLE 'Fixed Effects Only';
RUN ;
PROC MIXED DATA = schools ;
CLASS center ;
MODEL written = teacher / SOLUTION OUTPRED = outint ;
RANDOM int / TYPE = un SUBJECT = center SOLUTION ;
TITLE 'Random Intercepts';
RUN ;
PROC MIXED DATA = schools ;
CLASS center ;
MODEL written = teacher / SOLUTION OUTPRED = outslope ;
RANDOM teacher / TYPE = un SUBJECT = center SOLUTION ;
TITLE 'Random Slopes';
RUN ;
PROC MIXED DATA = schools ;
CLASS center ;
MODEL written = teacher / SOLUTION OUTPRED = outall;
RANDOM int teacher / TYPE = un SUBJECT = center SOLUTION ;
TITLE 'Random Intercepts and Slopes';
RUN ;
SYMBOL1 I=reg COLOR=black REPEAT=73 ; /* Define SAS/GRAPH symbols for plots */
PROC GPLOT DATA = outf ;
TITLE 'Fixed Effects';
PLOT pred*teacher = center ;
RUN ;
QUIT ;
PROC GPLOT DATA = outint ;
TITLE 'Random Intercepts';
PLOT pred*teacher = center ;
RUN ;
QUIT ;
PROC GPLOT DATA = outslope ;
TITLE 'Random Slopes';
PLOT pred*teacher = center ;
RUN ;
QUIT ;
PROC GPLOT DATA = outall ;
TITLE 'Random Intercepts and Slopes';
PLOT pred*teacher = center ;
RUN ;
QUIT ;
There are four PROC MIXED sections of SAS syntax. These fit your four models of interest, respectively: fixed effects only, random intercepts, random slopes, and random intercepts and slopes. Each section of PROC MIXED syntax features an OUTPRED = option on the MODEL statement that names an external SAS table where SAS writes the predicted values of written (labeled pred in the newly-created SAS table), along with the original values of teacher and center.
PROC GPLOT then uses the values created by PROC MIXED to plot the separate model-based regression lines for each center. The SYMBOL statement that appears before the PROC GPLOTs tells SAS/GRAPH to use the same plot symbol, a black regression line, for all 73 centers in the SAS table output by PROC MIXED.

Back to Top