Saturday 31 December 2011

NOTE: The Hits of 2011

To join all of the others on the internet, here's an "end of year" list... Blogger tells me that the most popular posts on the NOTE: blog this year were:

1. NOTE: SAS Global Forum 2011 - Update #3, 1,069 page views
2. NOTE: SAS Global Forum 2011 - Update #1, 929 page views
3. NOTE: SAS Global Forum 2011 - Update #2, 826 page views
4. NOTE: SAS Global Forum 2011 - Back-Ups, and The Last Word, 770 page views
5. NOTE: SAS Professionals Convention 2011 - Registration is Open, 753 page views

It's interesting to see that all of the posts relate to conferences - SGF in particular. I guess that's a reflection of our shared interest in the latest SAS news. I have every intention of being at SGF 2012 in Florida, USA between April 22 -25. And I'll endeavour to post all the latest news as soon as I find it.

Thanks for your continued support of NOTE:. Here's hoping we all have a healthy and happy New Year.

Tuesday 27 December 2011

NOTE: Decoding the _TYPE_ Variable (Golden Oldies)

Our mini-series of hints and tips from past decades continues with 1999...

Appearing in the same issue of The Missing Semicolon (TMS) was i) an advertisement for Year 2000 Support (do you remember "Y2K"?), and ii) an article entitled "Decoding the _TYPE_ Variable (in PROC SUMMARY and PROC MEANS)" on page 5.

The issue in question was June, 1999: Volume 2, Number 2. The Missing Semicolon, published by Systems Seminar Consultants, has been informing SAS practitioners since June 1998 (just a few months after the first edition of VIEWS News was put to print and posted to its subscribers). It continues to this day and offers free subscriptions to all SAS practitioners.

Contemporary users of PROC MEANS and SUMMARY have the benefit of the CHARTYPE option, but the principles of understanding the values of the numeric _TYPE_ variable or the 1's and 0's of the character _TYPE_ are the same.

Tuesday 20 December 2011

NOTE: 180 Terrific Ideas for Using and Learning the SAS System (Golden Oldies)

Continuing my mini-series of hints and tips from previous decades, here's a paper from my first SAS Users Group International (SUGI) conference. The 18th SUGI (predecessor to today's SAS Global Forum (SGF)) was held in 1993 in New York. Helen and Ginger Carey's paper was boldly entitled "180 Terrific Ideas for Using and Learning the SAS System", and it didn't disappoint. Whilst not all of their ideas are useful today, a significant number of them are. Note in particular their ideas regarding a) work habits, b) testing and debugging, and c) programming style.

Thursday 15 December 2011

NOTE: BY Variables in Titles (Golden Oldies)

Whilst the capabilities of SAS software increase markedly from release to release, the core capabilities remain the same. And that means that many hints and tips from years gone by are still valid today. Starting today I'm running a mini-series of tips from past decades, starting with an article by myself in the first ever issue of VIEWS News (winter 1998).

VIEWS News was a printed and posted newsletter that was published quarterly by VIEWS - the UK's independent user group for SAS practitioners.

BY Variables in Titles

You can insert labels and/or values of BY variables into titles. For example, suppose you have a data set sorted by the variable TRTSEQ, with label "Group" and formatted values "Placebo", "Wonder Drug", etc.

options nobyline;

proc print;
  title "List for #byvar1 - #byval1";
  by trtseq;
  var var1-var10;
run;


For each BY group, you'll get the titles:

List for Group - Placebo
List for Group - Wonder Drug
etc.


The NOBYLINE option suppresses the normal printing of the BY lines, so you don't see "Group=Placebo." You can turn it back on with OPTIONS BYLINE.

Tuesday 13 December 2011

Testing - Peer Reviews

In my recent series on testing I concentrated on dynamic testing, i.e. executing some or all of your code to see if it works. The other type of testing is static testing. This was featured briefly in my "SAS Software Development With The V-Model" paper at this year's SAS Global Forum ( SGF).

The principle form of static testing is peer review. Like all tests and reviews, you have to have something to test or review against. For testing you will have a set of requirements or a set of specifications (ref: the V-Model). For peer review, a fellow programmer will take a look at your code and offer opinion on whether it meets the team’s coding guidelines and other standards (and whether it gives the impression of meeting the unit specification). Thus, to perform an effective peer review, you need some documented standards against which the review can be set.

I wrote a NOTE: post on peer review back in 2009, so I won't repeat those thoughts here; however, I did see a good post from Tricia on the BI Notes blog back on October. Tricia used the post to run through a number of potential pitfalls for those who are new to the process. In the article, Tricia offered valuable tips on how to keep the review focused & at the right level, and how to be properly prepared.

Done properly, peer reviews offer increased code quality, reduced cost of rework, increased amounts of shared knowledge of your applications, and increased sharing of code and design good practice.

NOTE: Programming Languages Can Be Copied...

... so says Yves Bot, advocate general at the Court of Justice of the European Union (CJEU). Mr Bot's guidance was issued in response to the July 2010 High Court case between SAS and World Programming Limited (WPL). The CJEU still needs to make a full judgement before handing things back to the High Court, but WPL will doubtless be pleased that Mr Bot appears to have sided with them (along with a US court in March 2011).

However, whilst the language itself cannot be protected, the implementation of the language *is* protected. So, the code that SAS's staff in Cary (and around the world) have written in order to implement the language cannot be copied. Anybody who wishes to produce software that can interpret the SAS language must write that interpreter all by themselves. As a result, you might expect the bigger company to be able to invest more time and effort into optimising the speed and efficiency of their product.

So, before you rush out hoping to find a cheaper, better solution, be sure to do your full due diligence on all of the candidates. Check that all elements of the language are implemented; be sure to perform some performance and reliability testing; and probe the support provisions of your potential suppliers. These things are less easy to duplicate. Caveat emptor.

Thursday 8 December 2011

NOTE: Hash Tables, An Introduction

In my recent series of articles on testing I used the hash table to provide a means to write/update a table from within a DATA step without demanding changes to the DATA statement, etc. I've had some very kind feedback on the series, plus some requests to explain what the hash table does and how it works. So here goes...

The hash table was introduced to the DATA step as a means of performing fast, in-memory lookups. The fact that the hash table is a "component object" means that there's a steeper than usual learning curve for this feature. The SAS Language Reference: Concepts manual provides an introductory section on the hash table, but for many DATA step coders the hash table remains unused. A better route to getting started with hash tables is the "Getting Started with the DATA Step Hash Object" paper in the DATA Step sub-section of R&D's SAS/BASE section of the SAS Support web site (follow the link to Component Objects and Dot Notation).

In a nutshell, the hash table is a lookup table that's stored (temporarily) in memory and allows you to search for values within it and thereby get associated values returned. Let's introduce ourselves to the hash table by taking a two step approach: firstly we'll create the hash table, secondly we'll use it for lookups against each of the rows in our input table. Our DATA Step will look like this:

data result;
  set input;
  if _n_ eq 1 then
  do; /* It's first time through, so let's create hash table */

    <create the hash table>
  end;
  /* For each row in input table, do a lookup in the hash table */

  <do a lookup>
run;


Let's make ourselves some test data and assume it contains the sales from our small car sales operation last week:

data SalesThisWeek;
  length make $13 model $40;
  infile cards dsd;
  input make model;
  cards;
Jaguar,XJR 4dr
Land Rover,Discovery SE
Jaguar,XKR convertible 2dr
;
run;


We have a price list for all of the cars we sell; it's in sashelp.cars and contains variables named make, model and invoice. Frustratingly, the MODEL column contains leading blanks, so we use a quick DATA Step to get rid of them, thereby creating work.cars.

data work.cars; set sashelp.cars; model=left(model); run;

We want to load the price list into a hash table, then lookup each of our sold cars to find its invoice value. Here's the code to <create the hash table>:

  DECLARE HASH notehash (DATASET:'work.cars');
  notehash.DEFINEKEY('make','model');
  notehash.DEFINEDATA('invoice');
  notehash.DEFINEDONE();


Woh! That code looks like no other SAS code we've ever seen!! That's because the hash table is a "component object" and the syntax for dealing with components objects differs from mainstream DATA Step syntax. It's called "dot notation". It quickly makes sense once you get over the fact that it's different.

The first line tells SAS that we want to create a new hash table. Hash tables only exist temporarily in memory for the duration of the DATA Step. We use the DECLARE statement to begin to create a new component object; the first parameter (HASH) says what kind of component object we want to create; the second parameter (notehash) is an arbitrary name that we have chosen for the hash table; within the brackets we have told SAS that we're going to use some of the columns of the work.cars table as our lookup table.

The following two lines tell SAS a bit more about how we'd like to load and use the hash table; the fourth line (with DEFINEDONE) tells SAS we've nothing more to tell it about the hash table.

When we use dot notation we type i) the name of a component object, ii) an action we want to perform on the object, and optionally iii) parameters for the action. Parts (i) and (ii) are separated by dots, and the parameters (iii) are enclosed in brackets.

When we create a hash table, we have to declare it, then we have to specify a) the key column(s), i.e. the column(s) that we'll use to find something in the hash table, and b) the data column(s), i.e. the column(s) whose values will be returned once the key values are found in the hash table. In our case, MAKE and MODEL are our key columns, and INVOICE is our data column.

After specifying our key and data columns (with the DEFINEKEY and DEFINEDATA actions) we tell SAS that we're done by performing the DEFINEDONE action on the hash table.

The dot notation is different to what we're used to, but it's not too tricky to get your head around.

Now that we've created our hash table in memory, for use during the DATA Step, all we need to do now is use it. We lookup things in the table by performing the FIND action on the hash table. If SAS finds the key value(s) in the hash table, it will automatically put the associated data value(s) into the data variable(s) in the DATA Step. So, in our case, we need a variable in the DATA Step named INVOICE. If we don't create that variable prior to using the hash table we'll get an error.

When we do a FIND, we're given a return code value that tells us whether SAS found the key(s) in the hash table. A return code value of zero tells us that all is well and the value was found; any other value tells us that SAS did not find the value. So, our code to <do a lookup> will look like this:

  length invoice 8;
  rc = notehash.FIND();
  if rc ne 0 then
    put "Oh dear, we sold something we can't identify";


Notice that there's no direct reference to INVOICE when we do the find. The fact that FIND will put a value into INVOICE is implicit from our preceding DEFINEDATA.

Put all together, our code looks like this:

/* Create our test data */
data SalesThisWeek;
  length make $13 model $40;
  infile cards dsd;
  input make model;
  put make $quote22. model $quote50.;
cards;
Jaguar,XJR 4dr
Land Rover,Discovery SE
Jaguar,XKR convertible 2dr
;
run;

/* Strip leading blanks from MODEL */
data work.cars; set sashelp.cars; model=left(model); run;

/* Add invoice values to sales by using lookup */
data result;
  set SalesThisWeek;
  keep make model invoice;
  if _n_ eq 1 then
  do; /* It's first time through, so let's create hash table */
    DECLARE HASH notehash (dataset:'work.cars');
notehash.DEFINEKEY('make','model');
notehash.DEFINEDATA('invoice');
notehash.DEFINEDONE();
  end;
  /* For each row in input table, do a lookup in the hash table */
  length invoice 8;
  rc = notehash.FIND();
  if rc ne 0 then
    put "Oh dear, we sold something we can't identify";
run;


Once you've got the basic hang of hash tables, the two best sources of reference information are:

a) The hash table tip sheet, available from R&D's SAS/BASE section of the SAS Support web site (see the link to the tip sheet over on the right hand side of the page)

b) Component Objects: Reference in the SAS Programmer's Bookshelf

There are many ways to perform lookups in SAS. Some examples would be i) formats, ii) the KEY= parameter of the SET statement, iii) table joins. The hash table is another option which can offer advantages in many cases. Have fun...

Tuesday 6 December 2011

NOTE: Testing Macros - Parameters Revisited

As my planned series on testing drew to a close last week, I got an email from Quentin McMullen with some very kind words about the NOTE: blog, but also some very erudite comments about my choice of parameters for my testing macros. Rather than paraphrase Quentin's comments, I decided to publish his email verbatim (with his permission). Here's the heart of Quentin's email, followed by a few brief comments from me.
Just a quick thought:

I have a similar macro to %assert_condition, but it only has one (main) parameter, &CONDITION, instead of three; &LEFT &OPERATOR &RIGHT.  So it looks like:

%macro assert_condition(condition,tag=);
 if &CONDITION then
   put "TESTING: &sysmacroname: TAG=&tag, OUTCOME=PASS";
 else
   put "TESTING: &sysmacroname: TAG=&tag, OUTCOME=FAIL";
%mend assert_condition;


So you can call it like:

%assert_condition(incount eq outcount)
or
%assert_condition (age > 0)
or
%assert_condition ( (incount=outcount) )

I tend to like the one parameter approach.

The only tricky part is if you have an equals sign in the condition, you have to put parentheses around the condition so the macro processor does not interpret the left side as a keyword parameter.  The nifty thing is that the parentheses also mask any commas,e.g.:

%assert_condition(gender IN ("M","F") )

Do you see benefits to the 3 parameter approach vs 1 parameter?
Yes, Quentin, I do very much see the benefits of your approach. Your example, using the IN operator, is particularly well chosen. Rest assured I'll be adapting the McMullen approach in future. Thanks for the comments. Keep reading NOTE:!

Thursday 1 December 2011

NOTE: Testing (Presenting the Results)

The preceding two articles in this series on testing presented a simple, generic macro for testing and recording test results. All that remains now is for us to tidy-up some loose ends.

Firstly, the macro assumes data set work.results already exists. And it also assumes that the data set contains appropriate variables named Tag and Result. We can quickly arrange that by being sure to include a call to the following macro in our testing code:

%macro assert_init(resultdata=work.results);
  data &resultdata;
    length Tag $32 Result $4;
    stop;
  run; 
%mend assert_init;

Finally, we want to present our results. We can do this easily with a simple call to PROC REPORT:

%macro assert_term(resultdata=work.results);
  title "Test Results";
  proc report data=&resultdata;
    columns tag result;
    define tag / order; 
  run; 
%mend assert_term;

Equipped thus, we can focus on our testing code, not the mechanics of collating and presenting results. For example, let's imagine we have some new code to test; the purpose of the code is to read a raw file (a.txt), create some computed columns, and write-out a SAS data set (perm.a). One of our tests is to check that the number of rows in the raw file matches the number of rows in the SAS data set. Here's our code to test this functionality:

%assert_init;
%include "code_to_be_tested.sas";
%assert_EqualRowCount(infile=a.txt,outdata=perm.a,tag=T01-1);
%assert_term;


We can make the results a tad more visual by colourising the pass/fail values:

%macro assert_term(resultdata=work.results);
  proc format;
    value $bkres 'PASS'='Lime'
                 'FAIL'='Red';
  run;

  title "Test Results";
  proc report data=&resultdata;
    columns tag result;
    define tag / order;
  define result / style(column)={background=$bkres.};
  run;
%mend assert_term;


This assumes you're using SAS Enterprise Guide. If not, you'll need to add some appropriate ODS statements around the PROC REPORT.

The downside of the macros as they stand at this point is that the results data set gets recreated every time we run the code. Maybe we don't want that because we want to collate test results from a number of separate bits of test code. So, finally, we can make the creation of the results data set conditional, i.e. if it doesn't exist we'll create, if it already exists then we'll leave it alone:

%macro assert_init(resultdata=work.results);
  %if not %sysfunc(exist(&resultdata)) %then
  %do;
    data &resultdata;
      length Tag $32 Result $4;
      stop;
    run;
  %end;
%mend assert_init;

NOTE: Advent Calendar 2011

Today is 1st December, and that can mean only one thing: time to turn to the traditional SAS Professionals advent calendar! There will be prizes every day throughout December, so make sure you visit http://www.sasprofessionals.net/ every day.

And if you're not already a member, sign-up, check-out the Expert Channel, the forums, and benefit from the ability to network with fellow SAS practitioners online! Attend next year's SAS Professionals Convention in Marlow (July 10 to 12) and learn from papers delivered by a mixture of SAS staff and customers.