Sunday 27 January 2013

NOTE: The Dreaded Double and Triple Ampersands

Aside from Chris Hemedinger's comment on SAS V9.3's PROC SQL INTO enhancements, my Macro Arrays, Straight From a Data Set article last week also caught the attention of another friend and regular correspondent - LeRoy Bessler. LeRoy emailed me to suggest that an explanation of why I used a triple ampersand in my macro code (see below) might be of value to many (especially since he is used to successfully using double ampersands in this situation).

%do i=1 %to &name0;
  %put I=&i, NAME&i=&&&name&i, AGE&i=&&&age&i;
%end;


LeRoy asked for an explanation. So here it is...

We're familiar with the single ampersand indicating that the name of a macro variable follows the ampersand, e.g. &guide instructs the SAS macro processor to substitute &guide with the value of the macro variable named guide. Thus, the value of guide gets passed to the SAS compiler.

The process of tokenisation and the processing that is done by the SAS macro processor includes a rule whereby two consecutive ampersands (a double ampersand) get resolved to a single ampersand. Not only that but the resulting single ampersand will be put to one side whilst the remainder of the token is parsed in the same manner. Once the first pass of the token is complete, it is all parsed again with the same rules. Here are some examples that illustrate what happens (I've used colour to highlight the separate clusters of characters that get processed):

%let guide = enterprise;
%let enterprise = starship;


Submitted by User After Pass #1 After Pass #2 Sent to SAS Compiler
guide n/a n/a guide
&guide enterprise n/a enterprise
&&guide &guide enterprise enterprise
&&&guide &enterprise starship starship

Let's use this information to decypher my original piece of SAS macro code:

%do i=1 %to &name0;
  %put I=&i, NAME&i=&&&name&i, AGE&i=&&&age&i;
%end;


In this case, we're looping around all values from 1 to whatever number &name0 resolves to. I want the code to produce the following output in the log:

I=1, NAME1=Alfred, AGE1=14
I=2, NAME2=Alice, AGE2=13
I=3, NAME3=Barbara, AGE3=13
[snip]


In the put statement, we print "I=" following by the value of &i, i.e. the index variable of the loop; then we print ", NAME", followed by the value of &I, followed by an equals sign. So far we've got "I=1, NAME1=". This is followed by &&&name&i. The first pass of &&&name&i resolves to a single ampersand followed by.... oh dear, there's a bug in my program!

a) There's no value for &name; my code had not previously created a macro variable called name

b) Moreover, I had not intended to use a macro variable named &name, I had wanted my code to merely resolve to name at this point

I should have used a double ampersand like LeRoy is used to. If I had specified &&name&i then the first pass would have produced &name2; and the next pass would have resolved to a student name (the array of macro variables had been loaded from sashelp.class).

So, I offer my thanks to LeRoy for politely highlighting my error, and inspiring this blog post.

But, one question remains: we understand how triple ampersands are the correct technique to be used in the first examples, but why did my code from the previous blog post work when the rules suggest it shouldn't? And my answer is: I don't know! Try it for yourself; SAS silently does what I wish, despite my code having too many ampersands.

Answers on a postcard please...