Monday, 11 February 2013

NOTE: DS2, Learn Something New!

There's never a bad time to learn something new. How about DS2?

In May of last year, upon returning home from SAS Global Forum, I wrote about DS2, a new alternative to DATA step. Generally available in V9.4, PROC DS2 is currently available in SAS V9.3 as an experimental technology (and V9.2 TS2M3 too).

DS2's focus is on high performance for data manipulation and data analysis; it incorporates threading. But there's a lot more functionality available besides. DS2 gives you, the programmer, far more flexibility with your coding and gives you far greater abilities to structure your code.

The DS2 syntax includes the ability to specify SQL within a SET statement, combining the benefits of both languages, additional data types, ANSI SQL types, programming structure elements, and user-defined methods and packages. In advance of the 9.4 launch, you can find a Getting Started and a Language Reference manual in the product documentation.

Here's a simple example of some DS2 code and its output:

17 proc ds2;
19   data _null_;
20     method init();
21       dcl varchar(16) str;
22       str = 'Hello World!';
23       put str;
24     end;
25   enddata;
27 run;
Hello World!

NOTE: Execution succeeded. No rows affected.
28 quit;

As you can see, DS2 is actually a PROC, but it takes DATA step-like code as statements within the PROC. Within the DATA step, code is arranged into methods - sub-routines if you like (similar to functions in java, or functions created by PROC FCMP). The most basic methods are called INIT, RUN, and TERM. INIT and TERM run once (at the beginning and the end of the DATA step, as their names imply), and RUN is executed once for each input row, just as a conventional DATA step does.

The Getting Started manual takes you through some of the basics of the syntax. Let's move on and see some of what I find most exciting about DS2.

Firstly (below), I'll take a look at structured programming with packages and methods. In subsequent articles I'll look at SQL in the SET statement and at high performance data analysis with threading.

So, let's look at packages and methods; a great means of structuring our code. Take a look at the code and log output below. Note how we create a package named pythagorus, and the package contains two methods (gethyp and getside). The methods can contain any amount of code, and the package can be stored as an external file. There are large overlaps with the abilities of the macro language here, but DS2 brings the advantage of using one coherent language, with many different types of data (not just character!).

17 proc ds2;
19   package pythagorus/overwrite=yes;
20     method gethyp(double a, double b)
21                  returns double;
22       a_sq = a**2;
23       b_sq = b**2;
24       return sqrt(a_sq + b_sq);
25     end;
26     method getside(double hyp, double sidea)
27                   returns double;
28       return sqrt(hyp**2 - sidea**2);
29     end;
30   endpackage;
31   run;

NOTE: Execution succeeded. No rows affected.
33   data demo(overwrite=yes);
34     method init();
35       short=3; long=4; hyp=.; output;
36       short=4; long=5; hyp=.; output;
37       short=.; long=4; hyp=5; output;
38       short=3; long=.; hyp=5; output;
39     end;
40   enddata;
41   run;

NOTE: Execution succeeded. 4 rows affected.
43   data results(overwrite=yes);
44     dcl package pythagorus pyth();
45     method run();
46       set demo;
47       select;
48         when (missing(hyp))
49           hyp=pyth.gethyp(short,long);
50         when (missing(short))
51           short=pyth.getside(hyp,long);

52         when (missing(long))
53           long=pyth.getside(hyp,short);
54       end;
55     end;
56   enddata;
58   run;

NOTE: Execution succeeded. 4 rows affected.
59 quit;

After the package and its methods are defined, we create some test data in a small DATA step, and then we generate our results in a final DATA step. Notice how we need to declare the package in our results DATA step before we use it.

The ability to create packages of methods will allow SAS code to be written in a clearer, more structured form than was previously possible. I look forward to discovering the full benefits of packages and methods as I use there new capabilities more and more.

DS2 will still perform type conversions (one of my bête noires of the DATA step; I wrote about it in September 2009 and in a follow-up article) but the rules are more complicated because DS2 introduces so many different types. Indeed, there's a whole chapter on it in the Language Guide.

Not quite sure where to start? Let PROC DSTRANS help you by translating a subset of your DATA step code into DS2 code. Then, if necessary, you can revise your program to take advantage of DS2 features before submitting your program using PROC DS2.

In my next DS2 post, I'll show how to sue SQL statement in your SET statements.


NOTE: DS2. Data Step Evolved?
NOTE: DS2, Learn Something New!
NOTE: DS2, SQL Within a SET Statement
NOTE: DS2, Threaded Processing
NOTE: DS2, Final Comments
NOTE: DS2, Final, Final Comments