Wednesday 2 May 2012

NOTE: DS2. Data Step Evolved? #sasgf12

One of the other "futures" sessions I attended at SAS Global Forum was The New SAS Programming Language: DS2 with SAS's Jason Secosky. Jason was at pains to point out that DS2 is not intended as a replacement for the good old DATA step. DS2 is an alternative to DATA step and has more of a focus than the generalistic DATA step.

Generally available in 9.4, PROC DS2 is currently available in SAS V9.3 as an experimental technology. Its focus is on high performance for data manipulation and data analysis. It incorporates threading.

DATA steps are in control of their data; they specify the source of their input data, and they specify the location of their output data. In contrast, DS2 is simply a node in a flow; DS2 uses data streams rather than specific data objects. So, DS2 is not a DATA step replacement, it's new technology.

DS2's syntax is similar in parts to DATA step, with DATA and SET statements, if/then/else statements, expressions and functions. However, DS2 adds structure to code. Some of its syntax will be familiar to SAS/AF SCL coders; it includes methods (including init, term, and run). It has lots more types of variables when compared with DATA step, e.g. integer and varchar. DS2 integrates with other languages (such as R, C, C+ +, IML, and SAS fcmp functions) through the concept of a package. Interestingly, we'll be able to edit our DS2 code in the Eclipse editor, wherein a debugger will be included.

In essence, DS2 is the means of taking code to data (ref: big data) and promises linear scalability.