Monday 28 January 2013

NOTE: Getting More for Less With SPDE

Would you like to read or write your data more quickly? SAS V9.1 introduced Scalable Performance Data Engine (SPDE, "Speedy Engine"), for free. Are you using it? Speedy Engine is the little cousin of Speedy Server (Scalable Performance Data Server, a chargeable product), but its potential benefits are not so little. With a simple set of changes to one LIBNAME statement, your code could run a lot faster.

Put simply, when you read data from a traditional SAS data set, the bits and bytes are read one-at-a-time from the disk. The same is true when you write data. However, if you use the Speedy Engine on your LIBNAME statement then your data will be written across (or read from) multiple disk drives, allowing SAS to send more data at the same time (because it can send the first 10 observations of your table to disk #1 and then send the second ten observations to disk #2 whilst waiting for disk #1 to finish writing the data). The parallel writing and reading of data means that the processor needn't spend most of its time waiting for disk operations to be completed.

Using Speedy Engine is simple, so long as you have multiple disks drives *and* your processor has multiple channels with which it communicate with the disk drives. Without multiple channels, the channel will become a bottleneck and you won't see the improvements you'd hoped for.

Here's a simple example LIBNAME statement:

libname sales spde "f:\alphasales\metadata" 
                   datapath=("g:\alphasales\datapart1" 
                             "h:\alphasales\datapart1" )
                   indexpath=("j:\alphasales\index" )
                   ;

Note how the LIBNAME statement largely follows the standard syntax, but we specify the SPDE engine after the "sales" libref, and we specify datapaths and an indexpath. What you need to know is that the Speedy Engine writes three different kinds of data: metadata, the index, and data. It simple needs to know where to put each kind of data. The first location specified in the LIBNAME statement (the F: drive in our example) is where the metadata will be stored. You don't need to worry about what the metadata contains, just be aware that it needs to go somewhere! For quick searching and retrieval, Speedy Engine will want to store index files somewhere, hence the indexpath. Finally, Speedy Engine will store your data in the multiple locations specified in the datapath. I've used two datapath directories in the example, but you can specify more.

If you look in these locations, after writing some data to the sales libref, you'll see files with unfamiliar suffixes to their names, i,e. not sas7bdat. This need not worry us, but it does emphasise a key point: you have to write data in Speedy Engine format before you can read it with Speedy Engine.

To get the most from Speedy Engine, make sure you have multiple processors, multiple channels, and multiple disk drives. If you don't know the detailed architecture of your SAS environment, and are not able to discover more information from your local support team, you can still use an empirical approach by trying different combinations of disk drives and see which combinations give best results. However, Speedy Engine offers features over-and-above performance improvements. For example, it offers implicit sorting (whereby you needn't pre-sort your input data when you use a BY statement), WHERE clause optimisation with OR operators, and more than 32k columns in a table(!).

Speedy Server (the big cousin) offers further advantages such as an Access Control List–based security model, support for parallel BY-group processing, and processing of implicit pass-through.

For more information on the Speedy Engine, take a look at the SPDE section of the SAS Support Focus Areas, including the Quick Set-Up Guide and Tips and Tricks. The NESG 2006 paper An Annotated Guide: The New 9.1, Free & Fast SPDE Data Engine by Russ Lavery and Ian Whitlock is a little dated, but it is useful and reliable nonetheless.