Wednesday 18 November 2009

NOTE: CATS, MISSING and CSV (Thanks For Your Help - Not!)

I warmly welcomed the CAT functions (CAT, CATS, CATT and CATX) when they were introduced in V9. They introduced a much neater style to SAS programs by eliminating the need to LEFT and TRIM our strings prior to concatenation, and the CATX function saved repeated concatenation of separator characters. I've encouraged my clients to adopt these functions (they also work faster than combinations of LEFT and TRIM), but last week I got a call from a disappointed client who was trying to use CATX and getting unpredictable results.

In addition to tracing the source of their problems, I was able to advise them on better ways of doing what they were trying to do (creating a CSV file).

Here's a simplified version of their problem, showing their use of CATX to concatenate with commas as separators:

data survey;
  length company type $ 25;
  input company $ CompID region $ new $ type $ PrevSurveys ;
SuperWidget -247 SOUTH Y TECHNOLOGY .
WonderToy -958 NORTH N RETAIL 2
options missing='';
data output;
  set survey;
  row=cats(CompID, ',',region, ',', new, ',', type, ',', PrevSurveys);
proc print;run;

Here's the output from the PRINT:

Obs company     type       ID   region new Surveys row
1   SuperWidget TECHNOLOGY -247 SOUTH  Y           -247,SOU
2   WonderToy   RETAIL     -958 NORTH  N   2       -958,NORTH,N,RETAIL,2

Notice how ROW for WonderToy is fine, but ROW's value for SuperWidget has been truncated.

My clients were converting to CATX from:
  row=trim(left(CompID)) !! ','
      !! trim(left(region)) !! ','
      !! trim(left(new)) !! ','
      !! trim(left(type)) !! ','
      !! trim(left(PrevSurveys))

I had encouraged them to do this throughout their programs (when appropriate) and you can see the benefit in readability. But the old code worked and produced the output below, the new code didn't. They weren't so impressed with my advice!

Obs company     type       ID   region new Surveys row
1   SuperWidget TECHNOLOGY -247 SOUTH  Y           -247,SOUTH,Y,TECHNOLOGY,
2   WonderToy   RETAIL     -958 NORTH  N   2       -958,NORTH,N,RETAIL,2

I managed to bypass the problem by converting the numeric PREVSURVEYS to character prior to using CATX, but I found the root cause of the issue in SAS KnowledgeBase article 34315. The article states that the use of MISSING='' in conjunction with any of the CAT functions can cause truncation (with no warning message). It's fixed in V9.2. SAS advises using a value for the MISSING= option or using a different method to concatenate the values.

Having got to the bottom of the problem, I then addressed what the client was actually trying to do: create a CSV file. It's no longer necessary to use a DATA step to create a CSV file. The simplest method is to use PROC EXPORT, but that doesn't allow you to control the column order (it simply uses the order of the columns in the input table, which may or may not be what you want).

The better route to is to use ODS CSV. The following code illustrates how to do this:

ods csv body="c:\temp\MyData.csv";
proc print data=sashelp.class;
ods csv close;

Usage of PRINT's VAR statement allows control of column order.

In summary, I still encourage clients to move to use of the CAT functions (at appropriate times in the development cycle). They make the code more readable and they run faster than the equivalent usage of other functions. But also, attention to the design is required so that higher-level solutions (such as PROC EXPORT) can be adopted.