Thursday 24 November 2011

NOTE: Testing (With Basic Macros)

So, having stressed the importance of testing in my previous post on the subject, let me give you some hints on how I keep the test phase efficient and effective on my projects. This will be a series of hints across the next few days. Today I'll offer some tips for automating your tests, and I'll describe a simple macro that you can use to highlight test results in your log. As the series continues I'll describe how to enhance the macro and easily add a reporting system to summarise your test results in one place.

There is a degree of overlap between FUTS (mentioned yesterday) and the macros that I shall describe. I recommend FUTS, but some of my clients find its size intimidating, and others don't have authority to download anything (especially "code") from the internet.

Let's assume you have a set of tests scripts, i.e. steps for the tester to take, accompanied by expected results for each step. Let's take a simplified example:

1) Run the customer load. Inspect the SAS log. Expect no error or warning messages.

2) Count the rows in the input customer file (CSV) and the warehouse customer table (SAS data set). Expect the number of rows in each to match. [I said it's a simplified example!]

We can automate the second test (the first too, but that's for another time). The benefit of automating is that we can re-run it quickly and effectively (after test failure, and for regression testing).

Our automated code might look like this:

%let infile=/a/b/cust.csv;
%let outdata=whouse.cust;
/* Pipe output from unix command via fileref */
filename in pipe "wc -l &infile";

data test1_result;
  /* Not interested in read observations, just want nobs */
  set &outdata nobs=outcount;
  /* Get line count from unix command */
  length incount_c $20;
  infile in;
  input incount_c $20;
  incount = input(incount_c,best.);
  /* Compare the two figures */
  if incount eq outcount then
    put 'Test 1 was passed';
  else
    put 'Test 1 was failed: ' incount= outcount=;
  stop;
run;


This code will use the unix "wc -l" command to return a line count of the input file via the piped fileref, and the nobs parameter on the set statement to return a row count of the warehouse table. It then compares the two values and reports the success/failure of the comparison into the log.

You can see that we might do a number of similar comparisons in our whole test suite, so it would be useful to report them to the log in a consistent, easy to find fashion. We can turn the assertion into a macro as follows:

%macro assert_condition(left,operator,right,tag=);
  if &left &operator &right then
    put "TESTING: &sysmacroname: TAG=&tag, OUTCOME=PASS";
  else
    put "TESTING: &sysmacroname: TAG=&tag, OUTCOME=FAIL";
%mend assert_condition;


I've introduced the tag parameter for use as a unique identifier for each test. We can use the macro within our DATA step as follows:

data test1_result;
  /* Not interested in reading observations, just want nobs */
  set &outdata nobs=outcount;
  /* Get line count from unix command */
  length incount_c $20;
  infile in;
  input incount_c $20;
  incount = input(incount_c,best.);
  /* Compare the two figures */
  %assert_comparison(incount,eq,outcount,tag=Test 1);
  stop;
run;


And we can create a higher-level testing macro that allows us to compare any raw file with a SAS data set and assert that the row counts will be equal:

%macro assert_EqualRowCount(infile=,outdata=,tag=);
  /* Pipe output from unix command via fileref */
  filename in pipe "wc -l &infile";

  data _null_;
    /* Not interested in reading observations, just want nobs */
    set &outdata nobs=outcount;
    /* Get line count from unix command */
    length incount_c $20;
    infile in;
    input incount_c $20;
    incount = input(incount_c,best.);
    /* Compare the two figures */
    %assert_comparison(incount,eq,outcount,tag=&tag);
    stop;
  run;
%mend assert_EqualRowCount;


Armed with our %assert_EqualRowCount macro we can easily automate a range of similar tests, and get consistent and easy to find results in the log.

However, the log is not the most convenient place to place our results. Putting them into a data set would be far better - we could have just one row per test result (no need to scan for messages in the log), and we could easily print out (with traffic lighting for pass/fail if we wanted).

In the next testing post I'll describe how we can conveniently put our results into a data set.