Wednesday 13 February 2013

NOTE: DS2, SQL Within a SET Statement

Earlier this week, I suggested taking a look at DS2 ("the DATA step evolved") as a learning opportunity for the New Year. I listed the advantages that DS2 offers over the traditional DATA step. One of those was the ability to use SQL within a SET statement, thereby combining the benefits of both languages. There follows a simple example that combines SQL's familiar aggregation capabilities with SAS's ability to produce more than one output data set from a single pass of the input data.

In fact, DS2 takes a number of ideas from SQL:
  • Apart from using ANSI SQL data types in addition to the traditional Numeric and Character types, DS2's DATA steps takes a leaf out of SQL's book when no output data set name is specified. Traditional data steps would create output data sets with names like WORK.DATA1; DS2 returns the result to the Output window, much like PROC SQL would do.
  • Another DS2/SQL similarity is that DS2 will not automatically replace or over-write an existing table; you must specify the OVERWRITE=YES data set option.
  • And, finally, PROC DS2 uses RUN and QUIT statement in a similar manner to PROC SQL. PROC DS2 uses the RUN statement to indicate that preceding code is complete and should be executed, whilst the QUIT statement tells the procedure that its work is fully complete and it can be unloaded from memory.
One interesting point to note is that DS2 does not respect the SASHELP library. If you reference SASHELP (on a SET statement, for example) you'll get an error message informing you that the "schema name SASHELP was not found". This is a shame because SASHELP's example data sets (such as CLASS and PRDSALE) are so useful when demonstrating and experimenting with code and ideas.

17 proc copy inlib=sashelp outlib=work;
18   select prdsale;
19 run;

NOTE: There were 1440 observations read from the data set SASHELP.PRDSALE.
NOTE: The data set WORK.PRDSALE has 1440 observations and 10 variables.

21 proc ds2;
22   data high(overwrite=yes) low(overwrite=yes);
23     drop count;
24     method run();
25       set {select  country
26                   ,year
27                   ,quarter
28                   ,sum(actual) as actual
29                   ,sum(predict) as predict
30              from work.prdsale
31              group by country,year,quarter};
32       if actual lt 5000 then output low;
33       else output high;

34     end;
35   enddata;
36   run;

NOTE: Execution succeeded. 24 rows affected.
37 quit;

Clearly, you could do all of this with a PROC SUMMARY (or PROC MEANS) followed by a conventional DATA step, but the input/output activity is much reduced with the above technique, and use of a SQL SELECT statement within a SET statement is particuarly useful for those who are less familiar with some SAS procedures.

However, observe how DS2's NOTE statement offers little information about the numbers of rows written to each of its output tables. Maybe greater information will be offered in a future version (I'm using V9.2 TS2M3).

And you should be aware too that DS2 does not (currently) incorporate all functions and statement that you'd expect in traditional DATA steps. For instance, we have SET but we don't have MERGE or UPDATE or MATCH. We must assume that these will come in-time too. However, the ability to provide some of these capabilities may, in part, conflict with DS2's ability to perform threaded execution, so some traditional DATA step capabilities may never make their way into DS2.

In the next DS2 article, I'll discuss high performance data access using DS2's threading capabilities.


NOTE: DS2. Data Step Evolved?
NOTE: DS2, Learn Something New!
NOTE: DS2, SQL Within a SET Statement
NOTE: DS2, Threaded Processing
NOTE: DS2, Final Comments
NOTE: DS2, Final, Final Comments