Monday 20 June 2011

NOTE: With SYSTASK, Even Men Can Multi-Task!

I've been doing a lot of file manipulation recently (hence my observations on INFILE's FILEVAR). I've become a great fan of SYSTASK for executing operating system commands. The key element to SYSTASK's capabilities is that it can execute commands in parallel, i.e. asynchronously. So, if you have a number of large files that you want to do time-consuming tasks upon (such as compress or perform a word count), SYSTASK can do them in parallel and you'll get your results quicker (if your system has multiple processors and/or cores, and decent I/O performance).

Here's a simple (unix) example that zips two files in parallel:

systask command "gzip /user/home/andy/alpha.csv" nowait taskname=alpha;

systask command "gzip /user/home/andy/alpha.csv" nowait taskname=beta;

waitfor _all_ alpha beta;

%put Both files are now zipped;


Note the NOWAIT keyword on each SYSTASK statement; this instructs SAS to continue execution rather than waiting for the command to finish. The WAITFOR statement (as its name implies) forms a synchronisation point in your code. In the example above, it will wait for "all" of the tasks named on the WAITFOR statement before allowing execution to continue beyond the WAITFOR statement.

In SAS 9.1 there's a restriction whereby you cannot use a tilde (~) or a wildcard (*). Aside from that, SYSTASK is a terrific means of speeding-up your SAS code and making greater use of your computing resources.