Monday 30 November 2009

NOTE: Star Schemas and Slowly Changing Dimensions in Data Warehouses

Most data warehouses include some kind of star schema in their data model. If the dimensional data in the warehouse is likely to change over time, i.e. you have a slowly changing dimension, then you should consider this whilst designing your data model. This article provides a simple introduction to star schemas and slowly changing dimensions (SCDs). These terms are important if you are to understand some of the more powerful transforms in DI Studio.

The star schema, at its simplest, consists of a fact table surrounded by a number of associated dimension tables. The visual representation of this derives the name.

The fact table contains the facts, i.e. the central data of the warehouse (typically numeric), along with keys to the various dimension tables. A simple data model with one fact table and three dimension tables is shown below.

The keys in the dimension tables are called “Primary keys”; they uniquely identify each row. The same keys in the fact table are known as “Foreign keys”; they do not necessarily uniquely identify each row. Often the fact table will have an extra column called the “Compound Key”. This will contain a combination of the dimension keys, making it unique to each row of the fact table.

VIEWS News 48 - Newsletter ready for download

Phil Holland emailed this week to say that issue 48 of VIEWS News has gone up on the VIEWS web site, including a tip from RTSL.eu.

Phil does a great job with the newsletter and it always contains a plethora of handy hints and tips. The diary of upcoming SAS-related events is most useful too. Back-issues of the newsletter can be viewed from the SAScommunity web site.

Wednesday 25 November 2009

NOTE: EGP is a ZIP

I just discovered that Enterprise Guide's EGP files are actually zip files. You can open them with appropriate unzip utilities and extract your code, log and listing files.

POLL RESULT: What is your preferred method for accessing the NOTE: blog?

We recently ran a poll on the blog to understand your preferred method for accessing the information contained hereabouts. The ability to vote was open to all. A mere 27 votes were cast, not a sufficient number to draw significant conclusions from (and it was open to mis-use by voting more than once). However, the results are shown below and demonstrate a large apparent preference for access by RSS feed.

Visiting the web site:   1 ( 3%)
RSS reader:             16 (59%)
Email subscription:      4 (14%)
Twitter:                 4 (14%)
Other:                   2 ( 7%)

I'm intrigued by the 2 votes for "other". What other access channels are there? Please do post a comment with your suggestions.

NOTE: used to be published as an email and was sent (free) to over 4,000 subscribers. In converting to a blog we had assuemd that there would be a preference to receive new posts via email. It seems we may have been wrong. We will use the information to reconsider our posting pattern. We currently post just twice a week in order to keep down the volume of emails. If you have any posting preferences, please let us know in a comment.

Monday 23 November 2009

NOTE: More on CAT Functions

Further to last week's post on using the CAT functions, Jack Hamilton emailed me with some further advice:

There are also problems with the CAT functions in PROC SQL. I don't remember the exact details, but I had something like this:

select catx(', ', a, b, c, d, e, f, g, h, i)
into :macrovar

If the result string exceeded a certain size, it was truncated.

In this particular case I went back to the concatenation operator.

In general, functions that return a string value can look at their destination to decide how long the result can be. In this case, there is no predefined destination length, and the function arbitrarily chose a length that was too small.
Good advice. There's no reason to avoid the CAT functions, but they do need to be used with the same degree of consideration as other string functions and you need to pay attention to lengths.

Thanks Jack

Round the World: Race 3 Ends - SAS Consultant's Yacht Leads

Team Finland, with UK-based SAS consultant Andy Philips onboard, finished second in the South America to Africa leg of the round-the-world race. Having won the first two races, their second place in the third race means they sit at the top of the leaderboard as they fettle Team Finland for the fourth race - Africa to Australia.

Race three saw some interesting variations in tactics, some very heavy weather, plus a scary man-overboard rescue on Hull & Humber.

Race four sees the crews face yet tougher seas - "should be interesting!" said Andy in his email to me last week, having arrived into port the previous night. "Loving this hot water, beer, and a floor that isn't heeled over at 30 degrees!"

Good luck, Andy!

NOTE: Rivals Take Aim

The New York Times published a balanced and thoughtful feature on SAS last Sunday. The feature gently illustrated the counterpoint between SAS's Cary employees' apparently comfortable working life and the impending onslaught on SAS's markets resulting from recent business intelligence aquisitions from heavy-hitting IT giants such as IBM, SAP and Oracle. Recommended reading... Nice slide show too, including a picture of SAS's solar farm, operational since December 2008 and with an estimated 1.7 million kilowatt-hours (kWh) annual output. There's a second solar farm planned for March 2010, with an estimated 1.9 million kilowatt-hours (kWh) annual output.

Wednesday 18 November 2009

NOTE: CATS, MISSING and CSV (Thanks For Your Help - Not!)

I warmly welcomed the CAT functions (CAT, CATS, CATT and CATX) when they were introduced in V9. They introduced a much neater style to SAS programs by eliminating the need to LEFT and TRIM our strings prior to concatenation, and the CATX function saved repeated concatenation of separator characters. I've encouraged my clients to adopt these functions (they also work faster than combinations of LEFT and TRIM), but last week I got a call from a disappointed client who was trying to use CATX and getting unpredictable results.

In addition to tracing the source of their problems, I was able to advise them on better ways of doing what they were trying to do (creating a CSV file).

NOTE: SAS Take World Programming to Court

I noticed a recent press release from SAS stating that they have "filed a lawsuit in the High Court in London against World Programming Ltd for breach of license and copyright infringement relating to the use of SAS® Learning Edition to develop and test its World Programming System software."

World Programming System is a SAS work-alike that offers a cheaper alternative to the licensing of SAS/BASE and other specific modules. It's interesting to note that SAS's case is centred around the mis-use of a SAS/LE license rather than the simple fact that WPS duplicates SAS's language. However, languages cannot be protected in law, so SAS cannot obtain copyright on the SAS language. Hence, it is vulnerable to (cheaper) providers of similar functionality, such as WPS.

World Programming Ltd says the claim has no merit and that they will defend their position. Based in Hampshire in the UK, in close vicinity to one of IBM's offices, the World Programming Ltd web site offers direct sales but no marketing partners. However, they are shown in the IBM PartnerWorld catalog as Advanced Partners. And Minequest offers indicative pricing for the product.

It'll be interesting to see how this case develops and to see if it has any subsequent impact on other tools that use or interpret the SAS language in one way or another, such as Carolina (converts SAS to Java) and ASAP (automatically produces data flow and process flow diagrams of SAS code).

Monday 16 November 2009

NOTE: SAS-L is Still Extremely Popular!

Despite competition from many sides, it's good to see that SAS-L (now synonomous with usenet newsgroup comp.soft-sys.sas) is still going strong and still providing a great service for anybody with questions or opinions about SAS.

Whilst I know that the newsgroup started in July 1993, I confess I've been unable to find when SAS-L started.

Figures from Google* show 28,000 posts per year over the last three years, possibly down to 25,000 this year. Web sites, newsletters and blogs have come and gone but the fact that SAS-L is still capable of pulling sufficient users to generate 70 posts per day shows that it remains *the* site for SAS knowledge. Google's list of top posters this month includes the names of some familiar SAS-L stalwarts including Ron Fehd, Arthur Tabachneck and Peter Crawford.

Catalist shows there are over 3,000 people subscribed through SAS-L, plus countless others accessing the information through comp.soft-sys.sas.

If you have some information to share, or you seek an answer to a problem that you've been unable to resolve yourself, try posting and be prepared to be surprised at how quickly you get a conversation and valuable help.

* http://groups.google.co.uk/group/comp.soft-sys.sas/about

Radical Action for Radical Times

Back in October I mentioned some SAS publications, including Radical Action for Radical Times by my friend Jonathan Hornby. Jonathan and I go back to 1996 when he first joined SAS and joined the SAS project that I was leading. Following that he gave some cracking papers at VIEWS conferences (the UK's independant SAS user group at that time).

I most recently caught-up with Jonathan at this year's SAS Global Forum (SGF) where he was clearly energised by the process of producing his book (and the fact that all proceeds will go to childrens' cancer charities). I'm very pleased to see that Jonathan recently had the opportunity to write a post for the SAS Publications blog. In it he provides some of the process and thinking behind the book. And inspired by the writing process, Jonathan now has his own blog - Beyond Business - in which he will offer his interpretations of today's news and endeavour to extrapolate its impact upon our business futures.

Jonathan's book is about finding opportunities in times of adversity. Its message applies to all of us. We are not all in a position to directly influence executive strategies, but resonances of Jonathan's message encourage us to think clearly when defining the scope and objectives of our latest development projects. Reduced budgets mean we need to place greater focus on the value of each new feature that we deliver. And it's not simply a case of stripping-out functionality, it's also a case of looking for potentially overlooked features that provide the greater value in the current climate. Realigning development effort to focus more clearly on value-focused projects, changes and features rather than open-ended streams of continual change is crucial. Food for thought.

Wednesday 11 November 2009

NOTE: Quick Comments in Enhanced Editor and EG

I was rather late to the party in terms of realising that you can quickly comment and uncomment lines or blocks of text in the Enhanced Editor (or Enterprise Guide editor). Simply clicking ctrl-/ will insert /* at the beginning and */ at the end of each selected line. And it took me even longer to figure out that shift-ctrl-/ removes comments.

I'm sure there are many of these keyboard commands but I've never found them listed. Does anybody know where these are all listed? [See the subsequent comments for responses to my question]

NOTE: New Formats in SAS V9.2 Phase 2

Just a small thing, but worth noting nonetheless, SAS V9.2 Phase 2 introduced three numeric formats:

SIZEK - writes a numeric value in the form nK for kilobytes

SIZEKB - writes a numeric value in the form nKB for kilobytes

SIZEKMG - writes a numeric value in the form nKB for kilobytes, nMB for megabytes, or nGB for gigabytes

Useful from time-to-time I'm sure

Dashboards & Scorecards: What To Measure

My recent post on PROC GKPI and dashboards was very popular (judging by the number of hits), but I got a number of email comments suggesting I should talk more about the content rather the presentation, i.e. what to measure.

It's certainly a subject I'd like to return to, but in the meantime you could do a lot worse than to visit the blog of Wayne Eckerson (director of research and services for The Data Warehouse Institute - TDWI). In particular, check-out his recent series named Crossing The Chasm.

Tuesday 10 November 2009

NOTE: Find the Blog of Blogs at sasCommunity.org

sasCommunity.org has started a page named SAS Community Planet. Visitors to this page will find a collection of weblog entries by members of the SAS community... a blog of blogs. NOTE: is included in the collection (of course!), and the collection is growing steadily.

You can subscribe to the Planet through an RSS or OPML feed, so you need never miss a post from any of your favourite sources of SAS knowledge. The Planet itself is created from the RSS feeds of the individual blogs.

The web page seems to show about one month's worth of posts, but it loads very snappily.

Great idea!

NOTE: Registration for SAS Global Forum 2010 is Open!

Registration for SAS Global Forum (SGF) in Seattle, April 11 - 14, 2010 is now open. Registration has been launched with early bird discounts and special low-priced packages.

SGF is one of the very best ways to learn about SAS software and how SAS customers are getting the best out of it. The program and agenda are still being developed, but if past years are anything to go by, the event should be top of your list of training opportunities for 2010. The conference schedule and abstracts will be available on the web on January 15th. Early bird registration discounts end March 8th.

I've submitted a paper entitled "Developing with the V-Model" that I hope to be presenting in the Applications Development stream.

Proceedings from past conferences, i.e. the papers that were presented, are some of the best SAS material that you can find. They're all available online within the SAS suport web site. They're also available alongside the proceedings of many local user groups and industry user groups at Lex Jansen's excellent library of SAS reference material - very handy for searching.

Monday 9 November 2009

POLL: What is your preferred method for accessing the NOTE: blog?

We're conducting a quick poll to better understand how you prefer to get your information from the NOTE: blog. Please take a quick moment to visit the blog home page and offer your vote.

Thanks in anticipation...

Statistical Graphics in London

"A picture can speak a thousand words" - graphics can be the best way to gain insight into a data set. The PSI (Statisticians in the Pharmaceutical Industry, no I haven’t mis-spelt the abbreviation!) have organised a one day presentation and forum to discuss and share the thinking behind creating statistical graphics as well as the methodology, tools and processes involved.

Experts in the field of statistical graphics will share their experience and ideas with the audience which will then be presented with the results of a challenge: 5 software solutions 'gurus' were tasked with presenting real-life, but perhaps unusual data sets using the best graphics their software and personal experience can produce. Take a look at the flyer to see the detailed aganda for the day. You will see that the solutions incorporate a variety of software including SAS, Stata and R.

Register before November 20th and get a reduced rate.

The meeting will take place on Tuesday 8th December at the prestigious offices of The Design Council, 34 Bow Street, London, WC2E 7DL. To register, visit the PSI's Events page, find the Graphics event (the events are listed in chronological order), and follow the Registration link. After completing the quick but somewhat clunky PSI Non-Members Registration process (it's free), make sure you mention NOTE: in the "How did you hear about this Event?" field on the form

This promises to be an interesting and thought-provoking day.

NOTE: Handy Technical Tip Sheets

sasCommunity.org's daily tip last Tuesday was very good. If you've ever attended SAS Global Forum (or some other SAS events) you will have seen some really useful tip sheets being handed-out. In Tuesday's tip of the day, Little Birdie pointed-out that these are available for download from the SAS web site. Topics include ODS, the Data Step Hash Object, Regular Expressions, ODS and PROC TEMPLATE. Check them out for yourself. Highly recommended

Wednesday 4 November 2009

NOTE: SAS With Style: More thoughts on comments!

Taken from issue 2 of our (new retired) email newsletter:

Be Predictable

Using a consistent coding style is "a good thing", but there are two instances where you should not necessarily impose your chosen style. Firstly, when you are modifying some existing code that uses a different style; and secondly, there will always be exceptional cases where you can justify a deviation.

In respect of the first instance, it is not desirable to leave a module with a mixture of styles in it, so you have the choice of either changing everything to your own chosen style, or of making your changes and matching the existing style. You need to make a judgement about the amount of work involved in changing the existing code - are you just making a small change (in which case, do it in the existing style), or are you making wholesale changes (in which case, convert the lot to your new style).

In the eventuality of an exceptional case where you believe it is necessary to deviate from the chosen coding standards (the second instance), make sure you understand the reason for the particular coding standard that you would wish to break. And if you feel you are truly justified in what you are doing, make sure you document what you have done and the reasons why you did it.

Comments

Comments are a crucial part of coding, in my opinion. There are some schools of thought that eschew comments (most notably the eXtreme Programming folks) but I don't wholly subscribe to any of those.

The subject of comments is a big one. In this issue I'll focus on comments related to DO blocks since I talked about DO blocks in the last issue.

In general, comments should inform. They serve very little purpose if they merely state things about the code that are already obvious to the casual observer. In addition to informing the reader about your intent and the reasons why you chose a particular coding solution, comments can be useful to aid with the layout and structure of coding.

The layout and structure of your code helps you and others to comprehend it. One common aspect of layout and structure is indentation. I'll discuss indentation in greater detail in a future issue, but suffice it to say that most programmers indent lines of their code to indicate blocks and sub-blocks of code. The indentation is a great help in displaying the structure and flow of the code. I use paired comments with DO blocks to add to this. See the example below:

if 1 eq 0 then
do; /* Never true */
  moon = 'cheese';
  clouds = 'marshmallow';
end; /* Never true */

The DO and the END statements share the same comment. In the simple example above they don't add much to the understanding, but in complicated code with lots of sub-setted IFs and plenty of ELSEs to go with them, being able to match an END with a DO easily has great value.

On a SELECT statement, I use them thus:

select (switch);
  when (true)
  do; /* true */
    ...
  end; /* true */
  ...
end; /* select switch */

And with a DO loop, thus:

do ptr_digits = 1 to listlen(lDigits);
  ...
end; /* do over digits */

How about you? Do you do anything similar? Please leave a comment and let us know...

Listen: Be a Better Manager/Colleague

We all need a wide range of skills to be successful in the working environment (and outside of it too). Being great with SAS syntax should not be considered the sole measure of success at work. Good developers need to be skilled in (or have an appreciation of) design & architecture, testing, project management and (most importantly) communication. In this post we discuss a key communication skill that is so often lacking in our colleagues and managers: listening.

Whether we're a manager with staff who want to feel they're listened-to, or we're a developer with customers who want to feel they're listened-to, it's easy to think we're good at listening but how can we be sure. Here are a few tips for checking to see if you're a good listener:

1) Be Questioning. Do you ask as many questions as you are asked? When we're in a social setting and are asked "Hi, how are you" our natural response is to say "Fine thanks you, and how are you?". In the business setting you can do the same thing, e.g. "What's our top priority project?", "It's currently WonderWeb, do you see any other similarly high priorities?". A key tip here is to not interrupt while your question is being answered!

2) Be Seen To Act. Do you confirm that you'll act upon what you've been told? If you are questioning then you'll be given feedback; do you assure your listener that you'll act on what you've been told? "That's a great new feature suggestion, Jim. I'll put that into our backlog with a priority of 2. Okay?"

3) Focus. Does your listener feel that they are the most important person in the world to you at the time of your conversation? Maintaining eye contact (not looking over their shoulder) will assure them that you're focused on them.

4) Empathy. Do you put yourself into the shoes of the other person? Try opening yourself to the talker to the point where you can actually feel what they are feeling. Become, in small part, the person you are listening to. Standing in another's shoes and seeing the world through their eyes allows you to listen empathetically. Practice will give you the ability to see a situation simultaneously from multiple points of view.

5) Be Tough. Do you encourage tough questions, or do you try to avoid them being raised? If they're important they'll bubble to the top eventually anyway, so you might as well encourage people to raise their tough questions sooner rather than later. It gives you more time to resolve them.

You can practice your listening skills on a day-to-day basis. I've seen success with the adoption of Rosa Say's Daily 5 Minutes (D5M) working pattern. It's aimed at managers but is equally effective when used amongst peers. In a nutshell, Rosa suggests giving 5 minutes of no-agenda time to at least one member of staff or colleague per day. You can keep a simple log and rotate around the team day-by-day. It's 5 minutes for you to listen to whatever the talker wants to say. As with many successful techniques, it's simple and doesn't sound like it would be of any benefit, but I assure you it can work. Read Rosa's text and give it a try!

Rosa says in a recent blog post:
Managers who are humble are the ones other people will work hard for. A humble manager listens really well. She asks the people reporting to her what they think, and why, and what they would do about things.

You don't need to have all the answers; your job is to find them. And people who can't learn to be humble have a hard time learning where to look for those answers. Sometimes things are right there in front of them, and they don't even see.

Humble managers see with their ears, not with their eyes.
It's almost poetic!

Monday 2 November 2009

Peer Reviews / Code Reviews

We're big fans of peer reviews at RTSL.eu. We like the fact that peer reviews offer our teams the opportunity to a) apply additional quality control before we ship code out of our software factory, b) learn new techniques from each other, c) share knowledge about the applications we maintain.

In our context, a peer review is a review of an individual's work by a fellow team member.

We prefer the term Peer Review to Code Review because we need to review more than code. For starters, if we're using DI Studio, we need to review the DI jobs, not the code that it generates; and other SAS objects have no discerable code at all, e.g. information maps, but they still need to be reviewed. However "code" is a good shorthand for "all of the stuff that we make in our software factory".

If Peer Review is one of the last steps in our development process before the code ships out of the factory gates (to be passed to the testing team), we need to provide a quality stamp on all elements of what we're shipping. For us, peer review includes:
  • Be sure the code matches the Programming Guidelines. We're not overly prescriptive with our guidelines, but we have to be happy that we can all support each others code, and that means there has to be a degree of consistency in how we code
  • Be sure the code matches the Specifications. Peer review is static testing, so we don't execute the code, but we will check to see that the code appears to follow the Specs
  • Be sure the changes made (if this is an update to existing code) match the Specs for the change. The developer will provide code comparisons that show what lines have changed between the current and the new versions. You'd be surprised how often this shows us that a developer has either i) omitted a part of the change, or ii) inadvertantly introduced an unintended change
  • Be sure the developer has tested the code. The reviewer will expect to see test scripts and evidence of their successful execution
  • Be sure the code and documentation are properly stored in our Configuration Management system. We want to be confident we know what we're shipping and that we can subsequently reference it for bug fixing, etc
Doing these checks need not be too time-consuming - there's no need to re-review anything that hasn't changed - but problems found in peer review are far cheaper to fix at this stage than if they were not spotted until later (in testing, or in production).

Crucial to the success of peer reviews is the fact that the reviewer is there to learn as well as offer constructive criticism. AND let me emphasise that the criticism must be constructive!

Roy Osherove's team leadership blog recently offered some pointed advice on peer reviews. It's a good read and offers sound advice on how to introduce peer reviews into your development team.