Wednesday 9 January 2013

NOTE: The OPEN Function (reading data sets in macros)

In my previous post on this subject, I showed how to use the DATA step's OPEN function to get information about data sets and make that information available within a DATA step or as a macro variable. In today's post I will take it further and show how to retrieve rows and columns of information in a) a DATA step, independant of SET/MODIFY/UPDATE statements, and b) macro code.

Here's a simple sample:

data _null_;
                    /* Open the data set */
  dsid = open('SASHELP.CARS(where=(left(model) eq "XJR 4dr")))');
  rc=fetch(dsid); /* Fetch the first row */
  msrp_vn = varnum(dsid,'MSRP'); /* Get identifier for MRSP (price) column */
  msrp = getvarn(dsid,msrp_vn); /* Get the price value */
  rc = close(dsid); /* Close the data set */
  put _all_;
run;


In the log, we see:

dsid=1 rc=0 msrp_vn=6 msrp=74995 _ERROR_=0 _N_=1

We've used the FETCH, VARNUM and GETVARN functions in addition to OPEN and CLOSE that I showed in my previous post.

By default, the values from the opened data set do not become available for use in the DATA step. We need to use FETCH to make the first row accessible (qualified by the WHERE clause), and we need to use GETVARN to get the value of the specified column from the current row. However, GETVARN doesn't allow us to pass the name of the column, we have to get an ID for it from the VARNUM function and then pass that ID to GETVARN. It's a bit convoluted, but it works! Use GETVARN to get values from numeric columns; use GETVARC for character columns.

Here's how it looks in macro code:

27 %let dsid = %sysfunc(open(SASHELP.CARS(where=(left(model) eq "XJR 4dr"))));
28 %let rc=%sysfunc(fetch(&dsid));
29 %let msrp_vn = %sysfunc(varnum(&dsid,MSRP));
30 %let msrp = %sysfunc(getvarn(&dsid,&msrp_vn));
31 %let rc = %sysfunc(close(&dsid));
32 %put DSID=&dsid MSRP_VN=&msrp_vn MSRP=&msrp;
DSID=4 MSRP_VN=6 MSRP=74995


This works nicely, but if we have a large number of columns, it would involve an equally large number of calls to VARNUM and GETVARN/C. An alternative is to use CALL SET to get all columns immediately available. Here's the macro code:

17 %let dsid = %sysfunc(open(SASHELP.CARS(where=(left(model) eq "XJR 4dr"))));
18 %syscall set(dsid);

19 %let rc=%sysfunc(fetch(&dsid));
20 %let rc = %sysfunc(close(&dsid));
21 %put DSID=&dsid MSRP=&msrp ORIGIN=&origin ENGINESIZE=&enginesize;
DSID=4 MSRP=74995 ORIGIN=Europe ENGINESIZE=4.2


Now, that's a lot simpler!

However, when we use the same technique in DATA step, we must also declare *all* of the data set's variables (with a LENGTH statement, for example), else the FETCH function will fail. Even for a relatively simple data set like sashelp.cars that's a lot of "unnecessary" length infromation that we must add to our DATA step (and maintain it if the data set has its attributes chnaged subsequently). Here's trick to avoid this hassle. Add this statement in lieu of a long LENGTH statement: if 0 then set sashelp.cars;. You'll also need to add a STOP at the end of the DATA step.

The never-true IF statement won't execute the SET statement at run-time, and hence no observations will be read from sashelp.cars, but prior to the execution of teh DATA step, the compilation phase will automaticlly all of the data set's volumns to the PDV. Hey presto! Here's the DATA step code:

17 data _null_;
18   if 0 then set sashelp.cars;

19   dsid = open('SASHELP.CARS(where=(left(model) eq "XJR 4dr")))');
20   call set(dsid);

21   rc=fetch(dsid);
22   rc = close(dsid);
23   put _all_;
24   STOP;

25 run;

Make=Jaguar Model=XJR 4dr Type=Sedan Origin=Europe DriveTrain=Rear MSRP=$74,995 Invoice=$68,306 EngineSize=4.2 Cylinders=8 Horsepower=390 MPG_City=17 MPG_Highway=24 Weight=3948 Wheelbase=119 Length=200 dsid=1 rc=0 _ERROR_=0 _N_=1


So, there we have it, a means of reading a data set in macro code, and a means of referencing data set values in a DATA step without using a SET/UPDATE/MODIFY statement.

Further reading: William C Murphy wrote an excellent SAS Global Forum paper on using %SYSCALLL SET back in 2007. William showed how to load

In my next (final) post on this topic, I'll highlight a few more functions related to OPEN, and I'll discuss error-trapping.

DATA SET FUNCTIONS:

1. NOTE: The OPEN Function (getting data set information in macros)
2. NOTE: The OPEN Function (reading data sets in macros)
3. NOTE: The OPEN Function (error trapping)