Wednesday 30 September 2009

NOTE: SAS Editor Abbreviations - Create Your Own

One of the links recommended on our blog (if you scroll down far enough!) is SAScommunity.org. The site describes itself as a collaborative online community for SAS® users worldwide. It forms an ever-growing repository of good SAS information.

The home page features a tip of the day from a variety of respected SAS practitioners. Yesterday's tip from Art Carpenter caught my eye:
In the Enhanced Editor it is possible to generate a standard block of code or text using a simple abbreviation. Similar to the complete-the-text feature found in MSWord and other word processors, these abbreviations are easy to create. While in the Editor go to [tools] => [add abbreviation].
Art's tip applies to the classic SAS display manager. For details on how to do the same in Enterprise Guide, and how to delete abbreviations(!), see Chris Hemedinger's SAScommunity posting.

This tip is useful for blocks of text you use repeatedly. A standard comment header block immediately comes to mind. I'm sure you can think of many others.
[This post was corrected and augmented 30-Sept-2009]

NOTE: Sparklines Improve Your Communication Abilities

In issue 18 of NOTE:, published in 2006, we talked about Edward Tufte's concept of sparklines. Since then there's been more published material on SAS and sparklines, most notably Paul OldenKamp's SGF 2008 paper entitled "An Interpretation of Sparkline Graphs and Their Creation with SAS® Software". Paul very kindly refers to issue 18 of NOTE:.

Sparklines are a tremendously rich means of communicating information in a small space. BI tools such as QlikView now include sparklines as a standard graphical object.

Here's our original article:

Edward Tufte is a world renowned expert on information graphics, i.e. the science of presenting information in a graphical format. In his recent publication named Beautiful Evidence, Mr Tufte formally introduced the concept of sparklines - small, high resolution graphics embedded in a context of words, numbers, or images. You can read large parts of a draft of the sparklines section of Mr Tufte's book in the discussion thread he started on his site in 2004.

As illustrated in Mr Tufte's book, sparklines are an extremely powerful means of communicating information. I think they're at their most powerful when used within a paragraph of text, almost as if they were a word. For example, we had some very hot weather earlier this month, but it's now reduced to a more comfortable level, as you can see: Example sparkline. The sparkline neatly conveys all of the information without interrupting the flow or layout of the text. There are many variations on the sparklines theme, all of which are discussed in Beautiful Evidence.

If you want to experiment with using sparklines, you might like to try BitWorking's sparkline generator. It's a neat and simple web-based means of getting a sparkline for your data. Alternatively, if you visit Bissantz's page on sparklines, you'll see that they produce SparkMaker (an add-in for Microsoft Office that lets you create your own sparklines in Word, Excel, PowerPoint, or HTML documents) and SparkFonts (TrueType Fonts for the character-oriented generation of sparklines). And finally, there are plenty of macros and add-ins for producing sparklines within Excel - just use your favourite search engine..

However, as a SAS practitioner, I'm sure you're thinking to yourself "I'll bet SAS/GRAPH can do sparklines neatly", and you'd be right of course! The following basic macro, and example invocation, produces a very effective sparkline:

NOTE: Data Integration in V9.2

Many people are not convinced by SAS Data Integration Studio (DI Studio). To the seasoned SAS programmer, DI Studio is a glossy interface that cannot produce the range of functionality that their hand-written code can achieve, and the code that it does produce is less efficient in many cases. But look at the other side of the coin, to create a job with DI Studio requires knowledge of the data, but little knowledge of SAS syntax, so the skills required are more readily available. And with each new version of SAS comes more  transforms and other job nodes, i.e. steps in the job.

So, DI Studio may not yet be a panacea that allows data modellers to build extract, transform and load (ETL) code without SAS programming skills, but it's moving closer. SAS's new approach to releasing products independent of each other perhaps means that DI Studio can be be evolved and released to customers more quickly.

Think of this: if you go far enough back in history you will come across a time when machine code programmers scoffed at the possibility of producing compilers for 3rd generation languages like Fortran and Cobol. They argued that the compiler could never produce code that was optimised as well as their hand-crafted machine code. But the increase in machine speed combined with better compilers meant that the inefficiencies in the compiler's code were reduced, and the impact of those remaining inefficiencies was decreased by the faster machines. Sounds familiar? How long will it be before the majority of SAS jobs are produced with DI Studio, as SAS produce new and better transforms, and machines get faster?

I was impressed by the V9.2 release of DI Studio that I saw at SAS Global Forum (SGF) earlier this year. Apart from basic interface enhancements and a number of new transforms, I noted:

  • Ability to prevent DI Studio from "tidying" your layout. Thus you can position transforms and tables in places that make best sense to you
  • You can put the same table in more than one place on your layout, e.g. where it's used as input and output in different points
  • The addition of textual notes that can be placed on the layout in order to provide a form of documentation (or temporary development notes)
  • A usable undo capability!
  • Performance monitoring that shows real-time statistics while your job is running

Visual coding, such as is offered by SAS DI Studio, is the future. When will you get on board?

What are your thoughts? Post a comment!

Monday 28 September 2009

Problem Solving: Five Whys: Getting To The Root

Being a software developer isn't just about releasing great new software. Sometimes we need to fix problems that occur in the live system (or in testing or development). When investigating production failures we need to be sure they don't happen again, so we need to be sure we get to the root cause and fix that.

Six Sigma is a disciplined, data-driven approach and methodology for eliminating defects. It can be seen as being heavily statistics based (driving towards six standard deviations between the mean and the nearest specification limit), but many of its tools and techniques involve no statistics and are easily adopted.

Six sigma's basic approach to root cause analysis is called DMAIC; it stands for define, measure, analyse, implement and control. In other words, define your problem, measure it (so you can understand it and subsequently see that a change has been made), analyse the problem (and find a solution to the root cause), implement the solution, and control and monitor the ongoing production process to be sure it is now functioning well and continues to do so.

One of the Six Sigma techniques offered in the analysis phase is "5 whys". Somewhat reminiscent of conversations with my kids when they were younger, 5 Whys teaches us that by repeatedly asking "why" we can peel away layers of the problem until we get to the root cause. Asking "why" 5 times is usually sufficient, but the general rule is to keep asking until the root cause is identified.

Note that the technique is intended to offer a structured route to help teams establish root cause. It doesn't work well when wrongly used to emphasise the person or blame, turning the 5 Whys into the Five Whos!

Here's an example I recently encountered. Our daily production review revealed that one of our input had failed to load last night. Why? Because it didn't match the data structure expected by our data loader (an extra column had been added to the right-side of the file). Why? Because the group that regularly supplies the data file had changed the structure. At this point the knee-jerk reaction was to assume we had some unexpected emergency coding to do in order to get our data loader to accept the new structure, but we continued with 5 Whys. Why was the data structure changed, and why were we not told? Because (the supplying group told us) the change had been tested with all systems that used the file and they weren't aware that we used the file. Why weren't they aware we used the file? We'd informed them, but there had been staff changes and they didn't keep formal records. 

The negotiated resolution was to a) temporarily supply two files (the old structure and the new structure) until we had time to plan and schedule a change to our data loader, and b) create a more formal process for recording consumers of the data file.

The purpose of 5 Whys is to find the root cause and to avoid assuming that a symptom is the cause. The objective is to find THE problem rather the problem. Used thoughtfully, 5 Whys can be tremendously powerful in helping you identify and resolve production problems (and problems during testing and development phases too).

If you're not sure what to do, just ask your kids!

Thursday 24 September 2009

NOTE: Migrating Code Without Changing It (Where Am I?)

Developing code that needs no change when being moved from development through test and into production is important. In such circumstances it can be useful to know what environment the code is running in. Environment-specific information can be held in a control file; to look-up the correct items of information in that control file the code needs to know what environment it's running in.

If your prescribed directory structure takes this into account, finding-out the environment can be easy. For instance, you might design a directory structure whereby the second-level directory specifies the environment: F:\PensionApp\\code\macros. If the default execution directory for the stored process server or workspace server is F:\PensionApp\live\code then we can get the environment as follows:

401 data _null_;
402   rc = libname('HERE','.');
403   put rc=;
404   dir = pathname('HERE');
405   put dir=;
406   env = scan(dir,2,'\');
407   put env=;
408   rc = libname('HERE',' ');
409 run;


rc=0
dir=H:\PensionApp\live\code
env=live

Note the use of the SCAN function. If you can't be sure of the top-level directories (the number of them may differ in different environments) but you know that the penultimate directory specifies the environment then you can use a negative value in SCAN. To explain:

DEVELOPMENT:  /home/helen/PensionApp/dev/code
TEST:  /PensionApp/test/code
PRODUCTION:  /PensionApp/live/code

Use scan(dir,-2,'\') to find the penultimate directory name. The negative word number (-2) tells SCAN to count from the right-hand end of the string instead of the left-hand.

Wednesday 23 September 2009

Pharmaceutical SAS Users Gathering

PhUSE 2009 (Pharmaceutical Users Software Exchange) is just a few short weeks away. The 2009 conference will be held in Basel, Switzerland between the 19th - 21st October, and there are pre-conference training courses being held from 2pm on Sunday 18th October.

The thirteen streams cover everything from applications development and coding standards to data handling and CDISC standards, regulatory and management to tutorials and posters. Registration prices range from £322 to £832.60 depending on whether you attend one day or the whole confernce and whether you're a PhuSE member or not.

PhUSE is an independent non-profit organisation run by volunteers, now in its 5th year. Its purpose is to create a European forum for programmers in the pharmaceutical industry by way of a society that holds several events per year as well as a library of learning tools. Given the support of the major European pharmaceutical companies, PhUSE is the premier event for Pharmaceutical Programmers in Europe and beyond.

Tuesday 22 September 2009

NOTE: OLAP Security and the SAS-BI Blog

Whilst advising a client on the infamous "The cube has too few dimensions" OLAP error message, I was minded to re-research what information was available. I'm glad I did because I not only found the (expected) SAS usage note 14626 I also found a hit on the SAS-BI blog run by Angela Hall. This blog is a real gem and I was glad to be reminded of it.

The "too few dimensions" message is one of those mis-leading messages that can have you scratching your head for a long time. You can read the details in the two sources of information that I mentioned, but I'll summarise by saying that it's a security-related issue and not a problem with the cube - despite the implications of the message.

The SAS-BI hit provides a link to one of Angela's earlier postings that gives valuable advice on the many ways to refresh a cube. Deleting and rebuilding your cubes every evening isn't always the best approach!

The SAS-BI blog has been running since August 2005 (the very first post was talking about OLAP) and has to-date accumulated 134 posts. It represents an excellent collection of SAS Business Intelligence hints, tips and experience. Many of the posts relate to V9.2, so it's contemporary too. A visit to sas-bi.blogspot.com is highly recommended. For news of the latest blog updates (and Angela's progress with her 30-day challenge - #30dchallenge), follow SASBI on Twitter.

Round the World Yacht Race Continues

This morning the 10 yachts competing in the ten month Clipper Round the World 09-10 race will depart La Rochelle, France headed for Rio de Janeiro, Brazil. On board Team Finland again will be UK SAS consultant Andy Phillips. The three day first race (from England to France) was a mere warm-up sprint compared with the four week marathon that lies ahead, finishing in sunny Rio on 18th October.

BBC coverage from yacht Edinburgh Inspiring Capital this morning shows the weather as warm and calm, but this race sees the yachts making their first ocean crossing as they set out through the notorious Bay of Biscay and into the Atlantic Ocean. During the race to Rio de Janeiro the amateur crews will achieve their first Equatorial crossing.

There are two key obstacles in the way of the fleet during the race to Rio:
  1. The Canary Islands – do you take the shortest route through the middle and risk being becalmed in the wind shadow of the archipelago, or go round the outside?
  2. The Inter-Tropical Conversion Zone, better known as The Doldrums. This area of light and shifting winds provokes frustration at the slow progress of the yachts and, combined with the searing heat which causes tempers to fray, is often seen as a greater challenge for crews than sailing in strong winds.
We'll keep you posted on Andy's progress, or you can follow it yourself - links down the left-side of the Follow page give access to the crews' diaries, pictures, videos, and the map tracker.

Friday 18 September 2009

NOTE: Global Statements

I occasionally see confusion caused by SAS's global statements so here's a clarification. A typical example of the confusion is a programmer struggling to get libname statements to work properly within conditional logic with a DATA step:

%let env = DEV;
data _null_;
  if &env eq DEV then libname data "c:\mydata\dev";
  else libname data "h:\prod\finance";
run;

When the example  code is run it always allocates the "data" libname to the production data library, never to the development library.

The reason is that the libname statement is a global statement. The SAS Language Reference: Concepts manual (SAS Language Elements section, see v9doc.sas.com) explains that a SAS statement is a series of items that may include keywords, SAS names, special characters, and operators. All SAS statements end with a semicolon. A SAS statement either requests SAS to perform an operation or gives information to the system. There are two kinds of SAS statements:
  • those that are used in DATA step programming ("executable statements")
  • those that are global in scope and can be used anywhere in a SAS program ("global statements")
Global statements generally provide information to SAS, request information or data, move between different modes of execution, or set values for system options. You can use global statements anywhere in a SAS program. Most importantly, from the point of view of this post's topic, global statements are not executable; they take effect as soon as SAS compiles program statements.

Thursday 17 September 2009

NOTE: Be Of Good Type (Revisited)

I love it when one discovery leads to another. In my previous blog entry I highlighted SAS V9.2's new NESTED argument for the DATA statement. Given that it's a new argument, I wouldn't expect it to work in previous versions of SAS, but it's always worth trying these things because often they were available in older versions of the software, albeit undocumented and unsupported. What did I discover when I tried NESTED in V9.1.3? It didn't work, but the error message told me of other DATA statement arguments I'd never come across before!

ERROR 22-322: Syntax error, expecting one of the following: BUFFERED, MISSOPT, NOMISSOPT, NONOTE2ERR, NOPASSTHRU, NOPMML, NOTE2ERR, PASSTHRU, PGM, PMML, UNBUFFERED, VIEW.

Whilst I recognised VIEW and a couple of others, I had to look-up the others in SAS documentation. I didn't find most of them, so I used Google. The most intriguing results were for (NO)NOTE2ERR.

Wednesday 16 September 2009

SAS Consultant's Yacht Wins First Leg of Round the World Race

There's much celebration at RTSL.eu tonight with the news that the boat of our friend and SAS consultant Andy Phillips crossed the line in first place at the end of the Race 1 of Clipper 09-10 - the round the world yacht race. The crew were ecstatic, cheering and giving an impromptu Mexican Wave with their feet as they sat on the rail.

Progress hasn't been without troubles - two spectra strops (no, I don't know what they are either) have been broken in the few days that they've been at sea. Andy was at the helm on both occasions!

For the ten ocean racing yachts that started from Hull and Humber on Britain's east coast on Sunday it's been a breakneck charge down the west coast of France to the end of Race 1 near La Rochelle.

Congratulations go to Andy and his fellow crew members on Team Finland.

NOTE: Debugging Nested Loops in V9.2

I discovered a new DATA statement argument new in V9.2 today. The NESTING argument instructs SAS to print a note to the log at the beginning and end of every DO/END and SELECT/END pairing.

This is jolly handy for debugging recalcitrant DATA steps with unbalanced nesting.

Usage: data newdata / nesting;

Tuesday 15 September 2009

NOTE: New SAS Version Numbers

I'm not sure if I missed an earlier announcement on this. With the introduction of SAS 9.2, SAS have decided to use different version numbers for each product. The example given in the support note that I just saw goes as follows:

For example, your SAS 9.2 (TS2M0) installation might include Base SAS® 9.21, SAS/STAT® 9.21, SAS® Management Console 9.2, and SAS® Risk Dimensions 5.2. The release numbers "9.21", "9.2", and "5.2" are unique to the individual SAS products. If maintenance becomes available for SAS/STAT software, the product release number will increment to 9.21_M1. Or if a new release becomes available, the SAS/STAT product release number will increment to 9.22.
This change is significant because it frees SAS from the shackles of releasing all products at the same time - causing releases to move at the pace of the slowest development team.

A new procedure (PROC PRODUCT_STATUS) will provide a listing of the version numbers of the products you have installed. Of course, for client applications you can simply use the About window to find the version number. For systems with a Metadata server, the new ViewRegistry utility can be used. Frustratingly, I can't find any documentation on PROC PRODUCT_STATUS.

I am optimistic that this change will allow faster release cycles for individual products. Whilst faster release cycles aren't always a good thing, this gives SAS the opportunity to enhance individual products when there is a clear need to do so, without masking the new/enhanced functionality as a maintenance release. So, it gives SAS the opportunity to communicate more clearly with its customers about that new/enhanced functionality. That's got to be a good thing.

Sunday 13 September 2009

Good Luck Andy Phillips - Sailing around the world

Today marks the start of the 2009/10 Clipper Round the World yacht race. SAS consultant and friend of RTSL.eu Andy Phillips is a member of the Team Finland crew. Having won the preliminary race in Grimsby they've put down a marker to the rest of the fleet!

Organised by Sir Robin Knox-Johnston, the first man to sail solo, non-stop around the world, the 2009/10 race is a competition between ten 68 foot yachts across 40,000 miles, covering some (all?) of the most gruelling sea passages in the world. Whilst some crew members will be participating in a limited number of the legs of the journey, Andy (silly man) will be doing all of them.

We wish all of the competitors good luck, but our eyes will be on Andy over the next ten months.

Saturday 12 September 2009

VIEWS News 47 - Newsletter ready for download

Phil Holland emailed this week to say that issue 47 of VIEWS News has gone up on the VIEWS web site. Phil does a great job with the newsletter and it always contains a plethora of handy hints and tips.
Whilst I started VIEWS News in 1998 and cheerfully produced the first 15 quarterly editions, Phil has been producing the newsletter since issue 16 in the fourth quarter of 2001. Well done Phil!
Download your copy of Issue 47 of the quarterly VIEWS News from: http://www.sascommunity.org/wiki/Image:VIEWS_News_Issue47.pdf
Other back-issues of the newsletter can be downloaded from the newly updated page: http://www.sascommunity.org/wiki/VIEWS_News_backissues
If you would like to contribute an article, or discuss the possibility of doing so, send an email to Phil at newsletter@views-uk.org. A list of possible subjects for your articles can be found on the VIEWS web site, and anyone is very welcome to add to that list by sending emails with their own questions to newsletter@views-uk.org.

Wednesday 9 September 2009

Developing with the V-Model (my SAS Global Forum 2010 paper)

I've just submitted the beginnings of my paper for SAS Global Forum (SGF) 2010 in April in Seattle. Titled "Developing with the V-Model", the short abstract is as follows:
Software development is about building useful systems, not generating reams of documents. The V-Model helps the development team apply focus to what documents are useful and why.

The V-model offers a framework that clarifies the relationships between requirements, specifications, and testing. This paper discusses the benefits of the V-Model in a variety of differing development processes including waterfall and agile.

Now I need to finish the 1,000 word draft in order to fully submit it...

Tuesday 8 September 2009

NOTE: More on precision

Further to the earlier article on precision, the SAS papers listed below provide a lot of insight. However, unless I'm reading them wrongly, they don't tell me how many digits I can rely on when performing calculations.

This code, run on Windows, suggests the maximum is 15 digits

69 data x;
70   format num num2 best32.;
71   /* 0----+----1----+----2----+----3 */
72   char='0.123456789012345678901234567890';
73   num = input(char,best32.);
74   num2 = num + num;
75   put _all_;
76 run;
num=0.12345678901234 num2=0.24691357802469 char=0.123456789012345678901234567890 _ERROR_=0 _N_=1


SAS(R) 9.2 Language Reference: Concepts: Numeric precision in SAS software
http://support.sas.com/documentation/cdl/en/lrcon/61722/HTML/default/a000695157.htm

TS-DOC: TS-654 - Numeric Precision 101
http://support.sas.com/techsup/technote/ts654.pdf

Friday 4 September 2009

Newsletters About SAS Software

Whilst we retired our own NOTE: newsletter this year, there are still a number of other useful newsworthy publications available. The VIEWS user group publishes a quarterly newsletter (VIEWS News), and issue 2 of NOTE: featured another two...

If you want to receive more information about SAS software in your inbox, you might like to take a look at Systems Seminar Consultants' web site. They offer The Missing Semicolon at http://www.sys-seminar.com/. It is a quarterly publication, emailed in PDF format. Subscription is free.

I'm sure you're already aware of SAS Institute's technical and business newsletters at www.sas.com/subscriptions. If not, sign-up quickly! They're (bi)weekly HTML emails with links to articles on the SAS web site.

If you find any others, please let us know so that we can pass-on the word.

What is/was NOTE: ?

From the number of queries we've received it's clear that we're reaching a new audience with this blog and many readers don't know the history of NOTE:.

In a nutshell, NOTE: was our free, email newsletter that was sent irregularly and contained a wide variety of SAS-related news, hints, tips and experience. We officially retired NOTE: earlier this year and replaced it with this blog. There's an archive of the previous issues of NOTE: at www.NoteColon.info. Most of the articles are still relevant and interesting.

The first issue of NOTE: was sent to just 70 lucky recipients. Subscription was free, interest grew quickly, and the last issue went to 2,855 SAS practitioners. It peaked at 4,270.

Along the way we featured two crossword competitions, with prizes. You can still see these (and try them for yourself), alongside a couple of "coffee-time" crosswords at the RTSL web site. Have fun.

We will be re-publishing the most popular NOTE: articles into this blog from time-to-time, so you will be able to catch-up over time.

NOTE: The CONSTANT Function

I've been using SAS since 1983 and I'm always somewhat frustrated when I discover a function I've never heard of (even though it may have been introduced to SAS recently). Have you heard of the CONSTANT function? Well, I hadn't until yesterday.

In the context of understanding the precision of numeric data in SAS, we discovered the CONSTANT function and its "MACEPS" argument. MACEPS returns the "machine precision constant"; the SAS 9.2 Language Reference: Dictionary tells us that the following other arguments are available:
  • The natural base
  • Euler constant
  • Pi
  • Exact integer
  • The largest double-precision number
  • The logarithm of BIG
  • The square root of BIG
  • The smallest double-precision number
  • The logarithm of SMALL
  • The square root of SMALL
  • Machine precision
  • The logarithm of MACEPS
  • The square root of MACEPS
So, if you need the value of Pi (for instance), you no longer need to hard-code it.
A bit of research shows it has been available since at least V8, so my frustration of being unaware of it is compounded!