Wednesday 2 May 2012

NOTE: DS2. Data Step Evolved? #sasgf12

One of the other "futures" sessions I attended at SAS Global Forum was The New SAS Programming Language: DS2 with SAS's Jason Secosky. Jason was at pains to point out that DS2 is not intended as a replacement for the good old DATA step. DS2 is an alternative to DATA step and has more of a focus than the generalistic DATA step.

Generally available in 9.4, PROC DS2 is currently available in SAS V9.3 as an experimental technology. Its focus is on high performance for data manipulation and data analysis. It incorporates threading.

DATA steps are in control of their data; they specify the source of their input data, and they specify the location of their output data. In contrast, DS2 is simply a node in a flow; DS2 uses data streams rather than specific data objects. So, DS2 is not a DATA step replacement, it's new technology.

DS2's syntax is similar in parts to DATA step, with DATA and SET statements, if/then/else statements, expressions and functions. However, DS2 adds structure to code. Some of its syntax will be familiar to SAS/AF SCL coders; it includes methods (including init, term, and run). It has lots more types of variables when compared with DATA step, e.g. integer and varchar. DS2 integrates with other languages (such as R, C, C+ +, IML, and SAS fcmp functions) through the concept of a package. Interestingly, we'll be able to edit our DS2 code in the Eclipse editor, wherein a debugger will be included.

In essence, DS2 is the means of taking code to data (ref: big data) and promises linear scalability.

Tuesday 1 May 2012

NOTE: Libnames, Who Needs 'Em?

My team received what turned out to be an interesting call for help from one of our clients today. We resolved the client's coding error but it also served as a reminder of a little used feature of BASE SAS,  namely the ability to specify directory names in code rather than bother with libnames. There are pro's and con's for doing this. I'll discuss these below after I explain the feature.

We're used to specifying data sets on DATA statements in the "libname.dataset" style. However, instead of using a data set name, you can specify the physical pathname to the file, using syntax that your operating system understands. The pathname must be enclosed in single or double quotation marks. Here's an example:

data "c:\mydata\mydataset";

In the foregoing example, the DATA step would create a SAS data set file named mydataset.sas7bdat in the c:\mydata directory.

There's more information in the section titled "Accessing Permanent SAS Files without a Libref" in the SAS 9.3 Language Reference: Concepts. You will see that we can use the same naming technique in almost any situation where a library and data set name are expected, e.g. a SET statement, a MERGE statement, an UPDATE statement, a MODIFY statement, the DATA= option of a SAS procedure, and the OPEN function.

My client's coding error resulted from the fact that they had specified a macro parameter intended as a data set name and they had surrounded it with quotes. The call %demo("name") resulted in a DATA statement like this: data "name". As a result, SAS tried to create a file named name.sas7bdat in the SAS session's current directory. That directory was the root directory of the SASApp server, the user didn't have permission to write to it, and hence the code failed. The intention was to create a data set named "name" in the work directory, the actuality was significantly different. It was all caused by a common misunderstanding/mistake - using quotes around character strings in macros.

So, we understand how we can dispense with LIBNAME statements, but should we take advantage of this capability? Well, I can't see too many advantages, but I can see plenty of disadvantages!

The disadvantages include i) need to accurately specify directory paths throughout the program (rather than eight character libnames), ii) cannot quickly and easily change a directory location (as can be useful when testing), and iii) cannot specify an engine for the library.

Can you think of any advantages? Let us know your suggestions in a comment.