Thursday 15 April 2010

NOTE: Go Green and Save Data Miles

Whilst attendees of SAS Global Forum (SGF) in Seattle, USA have been hearing about further SAS developments in in-database processing, I've been spending time back home in Blighty enlightening one of my clients' junior staff in the benefits of SQL pass-through. If your data is held outside of SAS (in Oracle or Teradata for example), you need to understand pass-through.

It's all about doing things in the best place. Typically, the channel that passes data from non-SAS data storage to SAS software is relatively slow when compared with the non-SAS data storage's ability to process data. Expressed more simply, you're better off asking Oracle to apply a where-clause and send the results to SAS than if you ask Oracle to send all records to SAS and allow SAS to apply the filtering. SAS may be able to filter data quickly, but you're better off not sending all that data down the pipe in the first place.

Pass-through is a SAS technology that passes SQL instructions to the data storage server for execution. There are two types of pass-through: implicit and explicit. Implicit pass-through is the cleverest. Explicit pass-through is where you offer statements to SAS that specifcialy (explicitly) instruct SAS to pass instructions to the data server; with implicit pass-through, you rely on SAS intelligence to look at standard SAS code and deduce that it could interpret some of it into SQL and pass it through to the data server. Steve Feder's NESUG 2007 paper succinctly describes alternative techniques and their respective merits.

In-database execution of SAS merely takes pass-through a step further. Pass-through allows execution of SQL within the database; in-database execution of SAS allows far more work to be done within the database server and thereby further reduce the amount of data sent down the long and slow pipe from database to SAS software.

Beginning with Teradata in 2008, SAS began partnering with database vendors to develop and deliver in-database execution of SAS. At this year's SGF, SAS announced the engagement of BlueCross Blue Shield of Tennessee in a pilot program embedding SAS Analytics within the company’s IBM DB2 enterprise data warehouse. SAS currently has plans to provide in-database execution of SAS software within Teradata, Netezza, IBM DB2, HP Neoview, Aster Data and Green Plum. These are exciting initiatives that will take the speed of SAS analytics to a new level.