Thursday 9 January 2014

NOTE: Expanding Our Use of MCOVERAGE for Coverage Analysis of our Macro Testing

Over the last few days I've been revealing the features and benefits of the MCOVERAGE system option introduced in SAS V9.3. This system option creates a log file to show which lines of our macro(s) were executed, e.g. during our tests.

Knowing that we tested all lines of code, or knowing that we tested 66% of all lines of code is important when judging whether we have tested sufficient amounts of our code to give sufficient confidence to put the new/updated system into production. This information relates back to our testing strategy (where we specified targets for the proportion of code lines tested). It also helps us spot dead lines of code, i.e. lines of code that will not ever be executed (perhaps due to redundant logic).

Yesterday I showed code to read an mcoverage log file and create a table to show which macro lines had been executed and which had not. My code was basic and only worked for one execution of the tested macro. Quite often we need to run our code mor than once to test all branches through our logic, so today I'll discuss upgrading my mcoverage processing code so that it handles multiple executions of the tested macro.

We might start by running our tested macro twice, with two different parameter values...

filename MClog "~/mcoverage2.log";

options mcoverage mcoverageloc=MClog;

%fred(param=1); /* Take a different path through the code */

filename MClog clear;
* BUT, see my note about closing MClog at 
  the end of my earlier blog post;

The mcoverage log produced from these two consecutive executions looks like this:

1 1 18 FRED
2 1 1 FRED
3 17 17 FRED
2 1 1 FRED
2 2 2 FRED
2 3 3 FRED
2 4 4 FRED
2 4 4 FRED
2 4 4 FRED
2 5 5 FRED
2 6 6 FRED
2 7 7 FRED
2 8 8 FRED
2 8 8 FRED
2 9 9 FRED
2 13 13 FRED
2 18 18 FRED
1 1 18 FRED
2 1 1 FRED
3 17 17 FRED
2 1 1 FRED
2 2 2 FRED
2 3 3 FRED
2 4 4 FRED
2 4 4 FRED
2 4 4 FRED
2 5 5 FRED
2 6 6 FRED
2 7 7 FRED
2 8 8 FRED
2 8 8 FRED
2 9 9 FRED
2 13 13 FRED
2 14 14 FRED
2 15 15 FRED
2 16 16 FRED
2 16 16 FRED
2 18 18 FRED

You will recall that type 1 records mark the beginning execution for a macro, type 3 records indicate non-compiled lines (such as blank lines), and type 2 records indicate executed lines of code.

Note how we now get two type 1 records. These each mark the start of a new execution of the %fred macro. Close inspection of the type 2 records shows different sets of line numbers for the first and second executions, reflecting different paths through the %fred macro code.

We're aiming to create an output that shows whether the lines of %fred macro code were executed in one or more tests, or not. So, given that non-executed rows of macro code don't create a record in the mcoverage log, we can process the mcoverage log quite simply by counting the number of type 2 records for each line of macro code. For simplicity, we'll count the type 3s too. The output that we get will look like this:

Recordnum RecordRectype Executions Analysis
1 %macro fred(param=2); 2 4 Used
2   * comment ; 2 2 Used
3   %put hello world: &param; 2 2 Used
4   %if 1 eq 1 %then %put TRUE; 2 6 Used
5   %if 1 eq 1 %then 2 2 Used
6   %do; 2 2 Used
7     %put SO TRUE; 2 2 Used
8   %end; 2 4 Used
9   %if 1 eq 0 %then 2 2 Used
10   %do; . . NOT used!
11     %put FALSE; . . NOT used!
12   %end; . . NOT used!
13   %if &param eq 1 %then 2 2 Used
14   %do; 2 1 Used
15     %put FOUND ME; 2 1 Used
16   %end; 2 2 Used
17 3 2 Not compiled
18 %mend fred; 2 2 Used

So, we can see that executing the %fred macro with two different values for param has resulted in all but three lines of code being executed. We might choose to add additional tests in order to exercise the remaining lines, or a closer inspection might reveal that they are dead lines of code.

The code to create the above output is included at the end of this post. The sequence followed by the code is as follows:
  • Read the mcoverage log file into a data set. Process the data set in order to i) remove type 1 records, and ii) count the number of rows for each line of macro code
  • Read the macro source into a data set, adding a calculated column that contains a line numbering scheme that matchers the scheme used by the mcoverage log. We are careful to preserve leading blanks in order to preserve indentation from the code
  • Join the two data sets and produce the final report. Use a monospace font for the code and be careful to preserve leading blanks for indentation
I'll wrap-up this series tomorrow with a summary of what we learned plus some hints and tips on additional features that could be added.

Here's the code:

/* This code will not cope reliably if the macro    */
/* source does not have a line beginning with the   */
/* %macro statement for the macro under inspection. */
/* This code expects a coverage log file from one   */
/* macro. It cannot cope reliably with log files    */
/* containing executions of more than one different */
/* macro.                                           */
/* Multiple different macros might be required if */
/* testing a suite of macros.                     */
filename MClog "~/mcoverage2.log"; /* The coverage log file (MCOVERAGELOC=) */
filename MacSrc "~/";      /* The macro source  */

/* Go get the coverage file. Create macro */
/* var NUMLINES with number of lines      */
/* specified in (first) type 1 record.    */
data LogFile;
  length macname $32;
  keep Macname Start End Rectype;
  infile MClog;
  input Rectype start end macname $;
  prevmac = compress(lag(macname));

  if _n_ ne 1 and prevmac ne compress(macname) then
    put "ERR" "OR: Can only process one macro";

  if rectype eq 1 then
    call symputx('NUMLINES',end);

  if rectype ne 1 and start ne end then
    put "ERR" "OR: Not capable of handling START <> END";

%put NUMLINES=&numlines;

/* Count the number of log records for each line of code. */
proc summary data=LogFile nway;
  where rectype ne 1;
  class start rectype;
  var start; /* Irrelevant choice because we only want N statistic */
  output out=LogFile_summed n=Executions;

/* Go get macro source and add a line number value that */
/* starts at the %macro statement (because this is how  */
/* MCOVERAGE refers to lines.                           */
/* Restrict number of lines stored to the number we got */
/* from the coverage log file.                          */
/* Numlines does not include %mend, so we implicitly    */
/* increment the number of lines by one and thereby     */
/* retain the line containing %mend, purely for         */
/* aesthetic reasons for the final report.              */
data MacroSource;
  length Record $132;
  retain FoundStart 0 
    Recordnum 0;
  keep record recordnum;
  infile MacSrc pad;
  input record $char132.; /* Keep leading blanks */

  if not FoundStart and upcase(left(record)) eq: '%MACRO' then
      FoundStart = 1;
      LastLine = _n_ + &NumLines - 1;

  if FoundStart then
    recordnum + 1;

  if FoundStart and _n_ le LastLine then

/* Bring it all together by marking each line of code */
/* with the ecord type from the coverage log.         */
proc sql;
  create table dilly as
    select  code.recordnum
    case log.rectype
      when 2 then "Used"
      when 3 then "Not compiled"
      when . then "NOT used!" 
      else "UNEXPECTED record type!!"
    as Analysis
      from MacroSource code left join LogFile_summed log
        on code.recordnum eq log.start;

proc report data=dilly nowd;
  define record /display style(column)={fontfamily="courier" asis=on};

filename MacSrc clear;
filename MClog clear;

*** end ***;


NOTE: Macros Newness in 9.4 and 9.3 (MCOVERAGE), 6-Jan-2014
NOTE: Macro Coverage in Testing (MCOVERAGE), 7-Jan-2014
NOTE: Making Sense of MCOVERAGE for Coverage Testing of Your Macros, 8-Jan-2014
NOTE: Expanding Our Use of MCOVERAGE for Coverage Analysis of our Macro Testing, 9-Jan-2014 (this article!)
NOTE: Wrap-Up on Test Coverage and MCOVERAGE, 10-Jan-2014