Sunday, 27 January 2013

NOTE: The Dreaded Double and Triple Ampersands

Aside from Chris Hemedinger's comment on SAS V9.3's PROC SQL INTO enhancements, my Macro Arrays, Straight From a Data Set article last week also caught the attention of another friend and regular correspondent - LeRoy Bessler. LeRoy emailed me to suggest that an explanation of why I used a triple ampersand in my macro code (see below) might be of value to many (especially since he is used to successfully using double ampersands in this situation).

%do i=1 %to &name0;
  %put I=&i, NAME&i=&&&name&i, AGE&i=&&&age&i;
%end;


LeRoy asked for an explanation. So here it is...

We're familiar with the single ampersand indicating that the name of a macro variable follows the ampersand, e.g. &guide instructs the SAS macro processor to substitute &guide with the value of the macro variable named guide. Thus, the value of guide gets passed to the SAS compiler.

The process of tokenisation and the processing that is done by the SAS macro processor includes a rule whereby two consecutive ampersands (a double ampersand) get resolved to a single ampersand. Not only that but the resulting single ampersand will be put to one side whilst the remainder of the token is parsed in the same manner. Once the first pass of the token is complete, it is all parsed again with the same rules. Here are some examples that illustrate what happens (I've used colour to highlight the separate clusters of characters that get processed):

%let guide = enterprise;
%let enterprise = starship;


Submitted by User After Pass #1 After Pass #2 Sent to SAS Compiler
guide n/a n/a guide
&guide enterprise n/a enterprise
&&guide &guide enterprise enterprise
&&&guide &enterprise starship starship

Let's use this information to decypher my original piece of SAS macro code:

%do i=1 %to &name0;
  %put I=&i, NAME&i=&&&name&i, AGE&i=&&&age&i;
%end;


In this case, we're looping around all values from 1 to whatever number &name0 resolves to. I want the code to produce the following output in the log:

I=1, NAME1=Alfred, AGE1=14
I=2, NAME2=Alice, AGE2=13
I=3, NAME3=Barbara, AGE3=13
[snip]


In the put statement, we print "I=" following by the value of &i, i.e. the index variable of the loop; then we print ", NAME", followed by the value of &I, followed by an equals sign. So far we've got "I=1, NAME1=". This is followed by &&&name&i. The first pass of &&&name&i resolves to a single ampersand followed by.... oh dear, there's a bug in my program!

a) There's no value for &name; my code had not previously created a macro variable called name

b) Moreover, I had not intended to use a macro variable named &name, I had wanted my code to merely resolve to name at this point

I should have used a double ampersand like LeRoy is used to. If I had specified &&name&i then the first pass would have produced &name2; and the next pass would have resolved to a student name (the array of macro variables had been loaded from sashelp.class).

So, I offer my thanks to LeRoy for politely highlighting my error, and inspiring this blog post.

But, one question remains: we understand how triple ampersands are the correct technique to be used in the first examples, but why did my code from the previous blog post work when the rules suggest it shouldn't? And my answer is: I don't know! Try it for yourself; SAS silently does what I wish, despite my code having too many ampersands.

Answers on a postcard please...

8 comments:

  1. 2-5 ampers works. 6 tries to resolve &Alfred. 7-9 is fine, 10-12 isn't. 13-17 is fine, 18-24 isn't. 25 is fine, and from then on I had to get back to work.

    The answer lies somewhere in this:
    MLOGIC(NOTECOLON): %DO loop beginning; index variable I; start value is 1; stop value is 1; by value is 1.
    MLOGIC(NOTECOLON): %PUT I=&i, NAME&i=&&&&&&&&&&&&&&&&&&&&&&&&&name&i
    SYMBOLGEN: Macro variable I resolves to 1
    SYMBOLGEN: Macro variable I resolves to 1
    SYMBOLGEN: && resolves to &.
    SYMBOLGEN: && resolves to &.
    SYMBOLGEN: && resolves to &.
    SYMBOLGEN: && resolves to &.
    SYMBOLGEN: && resolves to &.
    SYMBOLGEN: && resolves to &.
    SYMBOLGEN: && resolves to &.
    SYMBOLGEN: && resolves to &.
    SYMBOLGEN: && resolves to &.
    SYMBOLGEN: && resolves to &.
    SYMBOLGEN: && resolves to &.
    SYMBOLGEN: && resolves to &.
    SYMBOLGEN: Unable to resolve the macro variable reference &name
    SYMBOLGEN: && resolves to &.
    SYMBOLGEN: && resolves to &.
    SYMBOLGEN: && resolves to &.
    SYMBOLGEN: && resolves to &.
    SYMBOLGEN: && resolves to &.
    SYMBOLGEN: && resolves to &.
    SYMBOLGEN: Unable to resolve the macro variable reference &name
    SYMBOLGEN: && resolves to &.
    SYMBOLGEN: && resolves to &.
    SYMBOLGEN: && resolves to &.
    SYMBOLGEN: Unable to resolve the macro variable reference &name
    SYMBOLGEN: && resolves to &.
    SYMBOLGEN: && resolves to &.
    SYMBOLGEN: Macro variable I resolves to 1
    SYMBOLGEN: && resolves to &.
    SYMBOLGEN: Macro variable NAME1 resolves to Alfred

    ReplyDelete
  2. And looking from the bottom to the top:
    1 ampersand; 2 ampersands; 3 ampers; 6; 12. I spot a pattern.

    ReplyDelete
    Replies
    1. And from the pattern, do you have an explanation?...

      Delete
  3. So the macro interpreter in effect mutters to itself, "I can't resolve this - not yet. I'll let that bit of text through unchanged, and see what happens on the next pass." But we only hear it muttering if SYMBOLGEN is turned on.
    It will only say produce an error if it knows there is no next pass available.

    Dave

    ReplyDelete
  4. Dave - you've got it. The trick is looking between 4 and 5 ampersands.

    4 (&&&&name&i)
    && resolves to & (twice) (&&(name&i))
    (&)i resolves to 1
    && resolves to & (one more time) => Alfred

    5 (&&&&&name&i)
    && resolves to & (twice): (&&(&name)&i) (Unable to resolve &name)
    && resolves to & (once): (&(&name)&i) (Unable to resolve &name)
    Oh, says parser, I get it:
    && resolves to &: (&(name&i)) Alfred - I thought I recognised you.

    ReplyDelete
  5. To sum up:

    The macro parser performs two basic processes on each pass. If it finds a double ampersand , it converts it into a single one and then proceeds with the rest of the statement; and if it finds a string &xyz that it recognises as a macro variable, it resolves it and then proceeds with the rest of the statement. At the end of the statement, it checks to see if it has been left with any ampersands in the parsed output. If there were, it runs through the process again.

    That's exactly how we understood it to work.

    The detail we had missed is in what happens if it finds a string &abc that it DOESN'T recognise as a macro variable name. If on the same pass, in the head of the same text string, it also performs a successful double ampersand resolution, it does not regard the unrecognised variable as an error, but allows the string &abc to go through unchanged to the next iteration. But if there were no double ampersand resolutions, it knows that it has had its last chance at a successful resolution and so it will throw a warning - I think it's not an error but I can't check as I don't have SAS installed here.

    The double ampersand &&name&i is parsed in two steps:

    && (I'll turn that into one &)
    name (plain text, pass it through unchanged)
    & (next thing should be a macro variable)
    i (yes, I recognise that - resolve it to 1)
    (so I now have) &name1 (and there is still an ampersand so I'll go round again)

    &name1 is parsed:

    & (next thing should be a macro variable)
    name1 ((yes, I recognise that - resolve it to Alfred)
    (so I now have) Alfred (and there is no ampersand now so I've finished)

    The triple ampersand &&&name&i is parsed in three steps:

    && (I'll turn that into one &)
    & (next thing should be a macro variable)
    name (no, I don't recognise that, but I just resolved && so I'll pass that unrecognised &name through)
    & (next thing should be a macro variable)
    i (yes, I recognise that - resolve it to 1)
    (so I now have) &&name1 (and there is still an ampersand so I'll go round again)

    &&name1 is parsed:

    && (I'll turn that into one &)
    name1 (plain text, pass it through unchanged)
    (so I now have) &name1 (and there is still an ampersand so I'll go round again)

    &name1 is parsed:

    & (next thing should be a macro variable)
    name1 (yes, I recognise that - resolve it to Alfred)
    (so I now have) Alfred (and there is no ampersand now so I've finished)

    As Laurie observed, 6 ampersands will result after three steps in the parser trying and failing to resolve &Alfred:

    && (I'll turn that into one &)
    && (I'll turn that into one &)
    && (I'll turn that into one &)
    name (plain text, pass it through unchanged)
    & (next thing should be a macro variable)
    i (yes, I recognise that - resolve it to 1)
    (so I now have) &&&name1 (and there is still an ampersand so I'll go round again)

    &&&name1 is parsed:

    && (I'll turn that into one &)
    & (next thing should be a macro variable)
    name1 (yes, I recognise that - resolve it to Alfred)
    (so I now have) &Alfred (and there is still an ampersand so I'll go round again)

    &Alfred is parsed:

    & (next thing should be a macro variable)
    Alfred (no, I don't recognise that, and I haven't resolved && so it's a problem)
    (so the final result is) &Alfred (and throw a warning message)

    ReplyDelete
    Replies
    1. Nice one. Good work from you and Laurie :)

      Delete