Thursday, 31 January 2013

NOTE: Ampersands Again

My recent post on double and triple ampersands in macros ended with some doubt. Laurie and Dave immediately took up the challenge to resolve the doubt. You can see the whole painful saga in the comments of the original post, but I've repeated Dave's summary of the conclusions here and I offer my thanks to Laurie and Dave for their tenacious work!
The macro parser performs two basic processes on each pass. If it finds a double ampersand , it converts it into a single one and then proceeds with the rest of the statement; and if it finds a string &xyz that it recognises as a macro variable, it resolves it and then proceeds with the rest of the statement. At the end of the statement, it checks to see if it has been left with any ampersands in the parsed output. If there were, it runs through the process again.

That's exactly how we understood it to work.

The detail we had missed is in what happens if it finds a string &abc that it DOESN'T recognise as a macro variable name. If on the same pass, in the head of the same text string, it also performs a successful double ampersand resolution, it does not regard the unrecognised variable as an error, but allows the string &abc to go through unchanged to the next iteration. But if there were no double ampersand resolutions, it knows that it has had its last chance at a successful resolution and so it will throw a warning - I think it's not an error but I can't check as I don't have SAS installed here.

NOTE: SEUGI Proceedings

Did I miss this? The proceedings of the SAS European User Group International (SEUGI, the European equivalent of SUGI and SGF) are now all online on the SAS Community web site. Papers from 1983 to the last event in 2007 are there.

Most of the pages were created in January 2013, so I think this is relatively fresh news.

A fabulous opportunity for European SAS practitioners of a certain age (like me!) to reminisce.

Monday, 28 January 2013

NOTE: Getting More for Less With SPDE

Would you like to read or write your data more quickly? SAS V9.1 introduced Scalable Performance Data Engine (SPDE, "Speedy Engine"), for free. Are you using it? Speedy Engine is the little cousin of Speedy Server (Scalable Performance Data Server, a chargeable product), but its potential benefits are not so little. With a simple set of changes to one LIBNAME statement, your code could run a lot faster.

Put simply, when you read data from a traditional SAS data set, the bits and bytes are read one-at-a-time from the disk. The same is true when you write data. However, if you use the Speedy Engine on your LIBNAME statement then your data will be written across (or read from) multiple disk drives, allowing SAS to send more data at the same time (because it can send the first 10 observations of your table to disk #1 and then send the second ten observations to disk #2 whilst waiting for disk #1 to finish writing the data). The parallel writing and reading of data means that the processor needn't spend most of its time waiting for disk operations to be completed.

Using Speedy Engine is simple, so long as you have multiple disks drives *and* your processor has multiple channels with which it communicate with the disk drives. Without multiple channels, the channel will become a bottleneck and you won't see the improvements you'd hoped for.

Here's a simple example LIBNAME statement:

libname sales spde "f:\alphasales\metadata" 
                   datapath=("g:\alphasales\datapart1" 
                             "h:\alphasales\datapart1" )
                   indexpath=("j:\alphasales\index" )
                   ;

Note how the LIBNAME statement largely follows the standard syntax, but we specify the SPDE engine after the "sales" libref, and we specify datapaths and an indexpath. What you need to know is that the Speedy Engine writes three different kinds of data: metadata, the index, and data. It simple needs to know where to put each kind of data. The first location specified in the LIBNAME statement (the F: drive in our example) is where the metadata will be stored. You don't need to worry about what the metadata contains, just be aware that it needs to go somewhere! For quick searching and retrieval, Speedy Engine will want to store index files somewhere, hence the indexpath. Finally, Speedy Engine will store your data in the multiple locations specified in the datapath. I've used two datapath directories in the example, but you can specify more.

If you look in these locations, after writing some data to the sales libref, you'll see files with unfamiliar suffixes to their names, i,e. not sas7bdat. This need not worry us, but it does emphasise a key point: you have to write data in Speedy Engine format before you can read it with Speedy Engine.

To get the most from Speedy Engine, make sure you have multiple processors, multiple channels, and multiple disk drives. If you don't know the detailed architecture of your SAS environment, and are not able to discover more information from your local support team, you can still use an empirical approach by trying different combinations of disk drives and see which combinations give best results. However, Speedy Engine offers features over-and-above performance improvements. For example, it offers implicit sorting (whereby you needn't pre-sort your input data when you use a BY statement), WHERE clause optimisation with OR operators, and more than 32k columns in a table(!).

Speedy Server (the big cousin) offers further advantages such as an Access Control List–based security model, support for parallel BY-group processing, and processing of implicit pass-through.

For more information on the Speedy Engine, take a look at the SPDE section of the SAS Support Focus Areas, including the Quick Set-Up Guide and Tips and Tricks. The NESG 2006 paper An Annotated Guide: The New 9.1, Free & Fast SPDE Data Engine by Russ Lavery and Ian Whitlock is a little dated, but it is useful and reliable nonetheless.

NOTE: Quick SAS Keyword Search in Chrome

Did you know that the Google Chrome browser allows you to specify multiple search engines, accessible by a short keyword? Well, I didn't until last week when my mention of Visuwords (graphical dictionary & thesaurus) made me wonder if there was some means of clicking a word on a page and taking it straight to Visuwords to look it up. Well, it seems there is, and I found a good use of the facility for SAS lookups too.

In the SAS context, I often find that searching for SAS words or phrases in a search engine just gives me too much material, so I will more often do my search at SAS's Global Forum Proceedings web page. It's not too tiresome to go to SAS's site (if I remember the URL), wait for it load, and then search, but with a search engine keyword I can now do it quicker by just searching for "sgf spde tips" if I'm looking for SPDE tips (SGF is my search engine keyword).

It's a pretty simple thing to do, but remember that this tip is for Chrome. As far as I can see, Internet Explorer is less flexible and accommodating.

Firstly, you need to know the right search URL to get a "sgf" result. This is most easily explained by demonstrating it for Visuwords first (all will become apparent in due course). A Visuword search URL is simply: http://www.visuwords.com/<word>.

So, we launch Chrome, go to the Customise and Control button (three horizontal lines, top right of the Chrome window), and select Settings. On the settings page you'll see an area headlined Search, and within that area you'll see a button labelled Manage Search Engines. Click it and you'll see a dialog window like that shown alongside this paragraph. Scroll to the bottom of the Other Search Engines and you'll see a blank set of options. Enter the following values.

  • Add a new search engine. Enter a meaningful descriptive name, e.g. Visuwords
  • Keyword. Enter a neat, short word to trigger your search, e.g. vw
  • URL. Enter the search URL, with %s in place of a search word, i.e. http://www.visuwords.com/%s
Once you've clicked OK, you're all set.

So, let's look-up "note". We begin by typing "vw" in the Chrome address bar (known as the omnibox); all looks normal until we press space; now we see a special prompt that says "Search Visuwords:"; finally we continue typing "note", press Enter, and we get taken to the Visuwords web site and we get shown the definition of "note". Hey presto! 

Okay, we can do the same for the SGF proceedings, but what is the search string? Well, go to the SAS Global Forum Proceedings web page and enter a search. See the URL that shows the results? It's long, isn't it?! But somewhere in there you'll find your search string. Copy the whole URL. Paste that into the Google settings dialog window, just replace your search string with %s, and hey presto again!

I've done the same for Lex Jansen's site too because his site searches across a number of different conferences, not just SAS Global Forum (and SUGI). So, I can search for "lex spde tips" and find SPDE-related conference papers from SAS Global Forum, SUGI, PharmaSUG, NESUG, SESUG, PhUSE, WUSS, MWSUG, PNWSUG and SCSUG.

You can do this with Internet Explorer too, but I've not found a simple route to add something that's not in IE's gallery of search engines. The easiest seems to be Enhance IE.

Can you think of any other SAS-related sites for which you'd appreciate a quick search mechanism? Let me know in a comment.

Sunday, 27 January 2013

NOTE: The Dreaded Double and Triple Ampersands

Aside from Chris Hemedinger's comment on SAS V9.3's PROC SQL INTO enhancements, my Macro Arrays, Straight From a Data Set article last week also caught the attention of another friend and regular correspondent - LeRoy Bessler. LeRoy emailed me to suggest that an explanation of why I used a triple ampersand in my macro code (see below) might be of value to many (especially since he is used to successfully using double ampersands in this situation).

%do i=1 %to &name0;
  %put I=&i, NAME&i=&&&name&i, AGE&i=&&&age&i;
%end;


LeRoy asked for an explanation. So here it is...

We're familiar with the single ampersand indicating that the name of a macro variable follows the ampersand, e.g. &guide instructs the SAS macro processor to substitute &guide with the value of the macro variable named guide. Thus, the value of guide gets passed to the SAS compiler.

The process of tokenisation and the processing that is done by the SAS macro processor includes a rule whereby two consecutive ampersands (a double ampersand) get resolved to a single ampersand. Not only that but the resulting single ampersand will be put to one side whilst the remainder of the token is parsed in the same manner. Once the first pass of the token is complete, it is all parsed again with the same rules. Here are some examples that illustrate what happens (I've used colour to highlight the separate clusters of characters that get processed):

%let guide = enterprise;
%let enterprise = starship;


Submitted by User After Pass #1 After Pass #2 Sent to SAS Compiler
guide n/a n/a guide
&guide enterprise n/a enterprise
&&guide &guide enterprise enterprise
&&&guide &enterprise starship starship

Let's use this information to decypher my original piece of SAS macro code:

%do i=1 %to &name0;
  %put I=&i, NAME&i=&&&name&i, AGE&i=&&&age&i;
%end;


In this case, we're looping around all values from 1 to whatever number &name0 resolves to. I want the code to produce the following output in the log:

I=1, NAME1=Alfred, AGE1=14
I=2, NAME2=Alice, AGE2=13
I=3, NAME3=Barbara, AGE3=13
[snip]


In the put statement, we print "I=" following by the value of &i, i.e. the index variable of the loop; then we print ", NAME", followed by the value of &I, followed by an equals sign. So far we've got "I=1, NAME1=". This is followed by &&&name&i. The first pass of &&&name&i resolves to a single ampersand followed by.... oh dear, there's a bug in my program!

a) There's no value for &name; my code had not previously created a macro variable called name

b) Moreover, I had not intended to use a macro variable named &name, I had wanted my code to merely resolve to name at this point

I should have used a double ampersand like LeRoy is used to. If I had specified &&name&i then the first pass would have produced &name2; and the next pass would have resolved to a student name (the array of macro variables had been loaded from sashelp.class).

So, I offer my thanks to LeRoy for politely highlighting my error, and inspiring this blog post.

But, one question remains: we understand how triple ampersands are the correct technique to be used in the first examples, but why did my code from the previous blog post work when the rules suggest it shouldn't? And my answer is: I don't know! Try it for yourself; SAS silently does what I wish, despite my code having too many ampersands.

Answers on a postcard please...

Saturday, 26 January 2013

NOTE: SQL INTO, Revisited

In Macro Arrays Straight From a Data Set last week I wrote about using SQL's INTO in order to create macro variables (and arrays of macro variables) from PROC SQL. I said that we had to use an arbitrarily large number to define the maximum size of the macro array. Well, Chris Hemedinger was quick to post a comment and point out that with SAS 9.3 there are a few improvements in the SELECT INTO range options.

Not only that but Chris highlighted his Improving on a SAS Programming Pattern post from March last year in which Chris showed the new SAS 9.3 syntax where we can now specify an open-ended range for macro variable names. Apparently, SAS will allocate as many macro variables as are needed to store each value, and the values will be TRIMMED of whitespace by default (but you can control this by specifying NOTRIM before the FROM keyword).

I've reproduced Chris's example below.

/* SAS 9.3 approach */ 
/* Create macro vars with values & count of distinct values */ 
proc sql noprint;
  select distinct TYPE into :varVal1- 
    from SASHELP.CARS; 
quit;
%let varCount = &SQLOBS.; quit; 

Thanks for the tip Chris.