Wednesday 24 February 2010

NOTE: PROC MEANS Gives You All It's Got (and More!)

An oft overlooked parameter for PROC MEANS (and PROC SUMMARY) is COMPLETETYPES. It tells MEANS to create all possible combinations of the values of the classification variables, even if some of those combinations don't exist in the data. And PRELOADFMT will create combinations from values that don't even exist in your input data. This can be very useful in presenting what appears to be a more complete picture of the input data and can be equally useful in presenting a consistent layout amongst a group of reports (or regularly produced reports).

Here's a simple code example:

data sales;
  region = 'North'; product = 'Widget'; sales = 500; OUTPUT;
  region = 'North'; product = 'Foobar'; sales = 300; OUTPUT;
  region = 'South'; product = 'Widget'; sales = 100; OUTPUT;
run;

proc means data=sales /*completetypes*/ sum;
  class region product;
run;


If we run it without COMPLETETYPES we get:

       Analysis Variable : sales

region    product    Obs             Sum
----------------------------------------
North     Foobar       1     300.0000000
          Widget       1     500.0000000
South     Widget       1     100.0000000
----------------------------------------


And if we run it with COMPLETETYPES, we get (with the new information highlighted in red):

       Analysis Variable : sales

region    product    Obs             Sum
----------------------------------------
North     Foobar       1     300.0000000
          Widget       1     500.0000000
South     Foobar       0               .
          Widget       1     100.0000000
----------------------------------------


If there's a chance that your data won't contain all possible values for any given classification variable (this is a possibility when you regularly run reports on differing data), you might want to consider PRELOADFMT - it's a key word for the CLASS statement. The formats associated with the CLASS variables are used to tell MEANS the full list of values. See this example (key code highlighted in red):

proc format;
  value $regions 'North' = 'North'
                 'South' = 'South'
                 'West'  = 'West'
                 ;
run;

data sales;
  format region $regions.;
  region = 'North'; product = 'Widget'; sales = 500; OUTPUT;
  region = 'North'; product = 'Foobar'; sales = 300; OUTPUT;
  region = 'South'; product = 'Widget'; sales = 100; OUTPUT;

run;

proc means data=sales completetypes sum ;
  class region product / preloadfmt;
run;


We define a format for REGION which contains "West", but West is not in our data at all. We add PRELOADFMT as an option to the CLASS statement, and here's the output that we get (new bits in red):

       Analysis Variable : sales

region    product    Obs             Sum
----------------------------------------
North     Foobar       1     300.0000000
          Widget       1     500.0000000
South     Foobar       0               .
          Widget       1     100.0000000
West      Foobar       0               .
          Widget       0               .

----------------------------------------


So, we've seen how we can zero-fill a data set using two different techniques, the first using all combinations of data within the input data set, and the second using a pattern of values from format(s).

See also:

  • PROC MEANS - CLASSDATA, MISSING
  • PROC FREQ - SPARSE
Sadly, the Summary Statistics task in Enterprise Guide (which uses PROC MEANS) does not offer COMPLETETYPES nor PRELOADFMT; but you can add them yourself within the Code Preview window.