Wednesday 31 March 2010

NOTE: Roots and Powers in SAS MDX

I had a problem today whilst trying to calculate RSS (square root of the sum of the squares) in an OLAP cube. I'm the first to admit that I'm no MDX expert (Multi-Dimensional Expressions - the language for querying OLAP cubes). Creating the sum of the squares was okay, but getting the square root was more difficult than expected.

MDX is the de facto standard language for querying cubes, analogous to SQL's ubiquity for querying "two-dimensional" relational data. It emerged from Microsoft's OLD DB for OLAP and was first supported by SAS with the V8 version of OLAP Server. MDX provides powerful functions for handling multi-dimensional collections of values. SAS's support for MDX means that you can connect a wide range of OLAP clients to a SAS OLAP server. OLAP clients that support MDX (and can hence be used with SAS OLAP server) include Excel, Cognos, ProClarity and Business Objects.

RSS is a measure of accuracy (estimated uncertainty) and is calculated by taking the square root of the total sum of each of the observed values squared. More often it is the residual values after comparing a model with observed values that are squared, summed and then rooted. Either way, the underlying calculations consist of a) the sum of the squares of the individual values, and then b) the square root thereof.

Summing-up values is the kind of thing that SAS OLAP server eats for breakfast, and MDX (whilst not directly supporting a square root function) provides an exponentiation operator (^, caret), sometimes also implemented as a function (power). So, I was surprised when I used ^0.5 to request the calculation of a square root and I got an error message.

Investigation showed that SAS V9.1.3 SP4 did not originally provide a "power" operator or function (see the SAS 9.1 OLAP Server MDX Guide). As far as I can see, it still doesn't, but problem note 17440 describes how hot fix E9BA09 introduced a square root function (SQRT).

Having scanned the V9.2 MDX documentation quickly this evening, I see no reference to caret (^), power, or exponentiation, nor sqrt (problem note 17440 says it was fixed in V9.2 TS1M0).

It seems most strange to me that SAS should continue to deviate from a full implementation of MDX. Whilst MDX is not a formally ratified standard, it *is* a stable, de facto standard. It is disappointing to see gaps in SAS's support of MDX. I tried to find some information on further gaps in the SAS implementation of MDX, but none seems to be available.

What's your mileage with SAS MDX? Have you found any further gaps or differences? How do you do exponentiation?

Wednesday 24 March 2010

NOTE: Clean-Up, But Don't Lose Anything (omabakup)

My talk of spring-cleaning reminds of days spent searching for stuff in my home-office only to find that my dear wife had spring-cleaned them to somewhere "more suitable" but less obvious to me. In these cases I hadn't actually lost them, but for a period of time I believed I had.

Losing some of your SAS data or metadata could be far more costly to your business than me losing my favourite pen from my desk, so you'd better to be doing good, reliable, regular and frequent backups. In most cases, sites are pretty good at backing-up disks using standard back-up routines, but the SAS metadata server's data needs special attention. Read on...

Before backing-up your metadata repositories, you need to pause the metadata server so that there are no attempts to change the data whilst it's being backed-up. Without pausing the metadata server you may find that the back-up fails or (worse) you may find that your back-ups are useless and cannot be restored.

The easiest and most effective means of pausing, backing-up, and then unpausing is to use SAS's supplied omabakup facilities. The SAS install/config process creates backupServer.sas and restoreServer.sas in your SASMeta\MetadataServer directory(s). These are pre-configured to suit your environment. backupServer.sas writes the backup files to a subdirectory called SASBackup in the MetadataServer directory. After backupServer.sas has run, your standard back-up routine can backup the contents of this directory at its leisure.

Alternatively, you can use the Backup Wizard from the Metadata Manager node in Management Console. This allows you to create jobs for immediate execution, or for later; and it automatically creates an equivalent restore job too. Tidy!

SAS offers advice about best practices for back-ups and restores in the SAS 9.2 Intelligence Platform: System Administration Guide. And, not surprisingly, Angela Hall's SAS-BI blog offers more than one valuable insight and tip for the use of omabakup.

Back-ups of your business data are crucial to the safe operation of your business, but they must be effective. Back-ups of your SAS metadata will not effective unless you are aware of the need to pause the metadata server beforehand. Using one of the facilities provided by SAS, you can be assured of reliable back-ups of your valuable metadata.

NOTE: Spring Cleaning

Spring has sprung, although it's difficult to detect here in South East London. With spring comes thoughts of spring cleaning. In the SAS context, the following two tips spring to mind:

1) Clean-Out Your Dead WORK Libraries

SAS uses the WORK library throughout the duration of your SAS session (local and remote). When your SAS session ends, SAS will automatically delete the WORK library. Specifically, your configuration options specify a "master WORK directory" where each new SAS session creates a sub-directory to use as its individual WORK library. Each individual SAS session deletes its individual WORK directory when it terminates. However, if SAS is cancelled or terminated abnormally then the individual WORK directory won't be deleted (nor its contents). Over time, the master WORK directory will get full of orphan WORK (sub)directories and their contents.

This applies not just to remote sessions belonging to SAS/CONNECT and workspaces pertaining to Enterprise guide; don't forget that your metadata server is a SAS session, as are your stored process server sessions, etc.

You need to regularly clear-out the master directory in order to avoid running-out of space, but you need to avoid accidentally deleting sub-directories that belong to active (non-orphan) SAS tasks. The SAS-supplied cleanwork utility is the answer. Yes, that's right, SAS supply a utility to keep your master directory clean, and it will check the provenance of each directory before deleting it, so it won't try to delete directories that belong to active SAS sessions. Whilst it can be run by individual users to tidy their own directories, it is best run as a scheduled admin task ans given appropriate privileges.

Check-out the operating system-specific SAS documentation for details. Read the documentation for Windows carefully - the last time I used cleanwork on Windows it was just a SAS program that deleted all directories older than a given date.

2) Tidy-Up As You Go

Help to reduce the overall size of WORK space usage by deleting temporary data sets as you go along. If I'm writing a modular program composed of a number of macros, I use the macro name as a prefix for all temporary tales created by the macro. By doing this I can easily delete all of a macro's temporary data sets at the end of the execution of the individual macro by using the colon (:) wild card in PROC DATASETS DELETE. The SYSMACRONAME automatic macro variable contains the name of the active macro, so it's easy to use the macro name as a prefix for all temporary data sets, and then delete them at the end of the macro. See the example below.

%macro leyland(...);

  data work.&sysmacroname._demotemp;
    set sashelp.class;
  run;

  ...

  proc datasets lib=work nolist;
    delete &sysmacroname.: ;
  quit;
%mend leyland;


Of course, this is not good for debugging! Deleting all of the temporary data sets automatically makes development and debugging very tricky. So, I add a parameter to each macro (passed from one embedded macro to another) that indicates whether the macro should tidy-up at its end. See below.

%macro leyland(tidy=y, ...);

  data work.&sysmacroname._demotemp;
    set sashelp.class;
  run;

  ...

  %if &tidy eq y %then
  %do;
    proc datasets lib=work nolist;
      delete &sysmacroname.: ;
    quit;
  %end;
%mend leyland;


Keeping your usage of WORK to a minimum will help your operational jobs run more reliably, and will save on the cost of disk space too. The two tips above will help you with the challenge.

Tuesday 16 March 2010

NOTE: "SAS Talks" Has a Home

The excellent SAS Talks series of free webinars (previously featured in NOTE:) has now got a home page where you can find previously recorded talks, plus information about upcoming talks.

So, you can now easily catch-up with the previous talks on SAS Enterprise Guide, SAS/IML, ODS statistical graphics, and multi-sheet output for Microsoft Excel from SAS.

The page shows two upcoming talks: Modifying ODS Statistical Graphics Templates in V9.2 later in March, and Formats and the Format Procedure in April. Registration for the former opens today. If you watch it live you can question the speaker.

Project Plans in Excel - As a Chart

The series of posts on project planning in Excel with Gantt charts has been very popular, and one of the most popular questions has been "Why don't you use an Excel chart?" Well, the answer's simple, they don't work very well for large list of tasks. For small lists they look very nice, but they don't scale well, hence I prefer to keep my Gantt in the cells of the worksheet. However, for completeness I thought I'd offer this bonus post to show how it's done. You can see the end result alongside this paragraph (right). I'm using Excel 2003.

We'll start with the result from the last post, including the progress bars in the worksheet. You can see it alongside this paragraph (left). Since we used the cells to indicate progress, we were limited to showing progress in chunks of whole days. In the chart we will be able to show a more accurate picture of progress.

I'm going to start by removing the groups that we had in the last result - I've never explored how they can successfully be charted. So, let's select the input data area (A2 to F9), go to the Subtotals window, and click the Remove All button (then confirm that you understand that entire rows will be removed). Our chart collapses and looks as shown below.

Tuesday 9 March 2010

Project Plans in Excel - Tracking to Completion


The series on maintaining a project plan and Gantt chart in Excel has been popular, and I've had a lot of queries about tracking progress. So, in this bonus post I’ll describe how to display tasks’ progress on the Gantt chart that was featured in the previous posts in this series. In addition, I’ll show how to highlight “today”. Alongside this paragraph (right) you can see what the result of this post looks like.

In the three previous posts in this series I described how to create a neat and simple Gantt chart, how to add dates to the day numbers, and how to group tasks. These three simple sets of steps have given the developer sufficient knowledge to quickly create a simple but effective Gantt chart that demonstrates the developer is in control of the project (without spending more time on planning than on delivery). Alongside this paragraph (left) you can see what the results of our previous efforts looks like.

Let’s start by inserting a column after E and heading it “%Done”. This is where you'll need to type values to indicate your progress. Then, after column N let’s add “Done” and “DoneEnd”. I’ve made the text colour of the latter two columns a semi-visible grey because they’re our working values and not of interest to the reader of the Gantt chart. If you see the picture below, you’ll see that I’ve also populated the %Done column with some values.


Now let’s populate the calculated columns. The working columns are not strictly necessary, but they’ll help illustrate the calculations that we’re doing. Firstly, let’s understand what we’re trying to achieve. Cell H3 represents progress on activity #1 on day 1. We’ll display a block in the cell if progress on the activity is equal to (or greater than) half a day’s effort. So, for activity #1 we can see that effort is complete up to the end of day 3; for activity #2 the effort is complete up to 2/3 of the way through day number 2. Since day number 2 for activity #2 is more than half complete we’ll put a block in that cell (but not day number 3).

Tuesday 2 March 2010

NOTE: Data Set Reader and Log Analyser from Oceanview

UK-based Oceanview Consultancy have a couple of neat SAS utilities that just might meet your SAS team's needs. DSREAD reads a SAS data set and provides information about it without requiring SA software; and Elvis(!) is a log analyser par excellence.

Back in May 2006 when NOTE: was an email newsletter and boasted 3,700 subscribers worldwide, issue 17 featured the Elvis log analyser (and some awful usage of some of The King's record titles). Well, Chris Long wrote to me recently to tell me of his latest creation - DSREAD.

NOTE: Don Henderson Has a Blog

Somewhat humbly titled "Jurassic SAS in the BI/EBI World", long-time guru of SAS Don Henderson has joined the blogosphere. Don's CV is long and impressive and includes over 30+ plus years of using SAS;
Don is author of the SAS Press book "Building Web Applications with SAS/IntrNet: A Guide to the Application Dispatcher", and one of the principal architects for sasCommunity.org.

I've already learned useful stuff from Don's postings, including his January 26 post "An autoexec facility for the Stored Process Server" wherein he mentions that stored processes can have a program automatically run after each invocation (like a retrospective autoexec). I'd overlooked this useful feature.

Project Plans in Excel - Grouping Tasks

In the two previous posts in this series I described how to create a neat and simple Gantt chart and how to add dates to the day numbers. In this post I’ll describe how to groups your tasks in the chart that was featured in the previous post. The picture alongside (right) shows the end result from today's post. Grouping tasks is a generally useful thing to do, but I also find that my list of tasks increases as time goes by, so I might not need groups to begin with, but they become a useful way of keeping my plan tidy after it has grown.

As with the previous cases, I’m going to describe a quick and simple method. The objective is to have a useful and communicative chart without spending too long on creating it and without making it difficult to maintain. We start with the chart that was created in the last posting (shown to the left). Remember my comments in the first post in this series: I expect SAS developers to run their own (small to medium sized) projects from time-to-time, and I expect them to know how to work to a plan.