Monday, 30 November 2009

NOTE: Star Schemas and Slowly Changing Dimensions in Data Warehouses

Most data warehouses include some kind of star schema in their data model. If the dimensional data in the warehouse is likely to change over time, i.e. you have a slowly changing dimension, then you should consider this whilst designing your data model. This article provides a simple introduction to star schemas and slowly changing dimensions (SCDs). These terms are important if you are to understand some of the more powerful transforms in DI Studio.

The star schema, at its simplest, consists of a fact table surrounded by a number of associated dimension tables. The visual representation of this derives the name.

The fact table contains the facts, i.e. the central data of the warehouse (typically numeric), along with keys to the various dimension tables. A simple data model with one fact table and three dimension tables is shown below.

The keys in the dimension tables are called “Primary keys”; they uniquely identify each row. The same keys in the fact table are known as “Foreign keys”; they do not necessarily uniquely identify each row. Often the fact table will have an extra column called the “Compound Key”. This will contain a combination of the dimension keys, making it unique to each row of the fact table.

VIEWS News 48 - Newsletter ready for download

Phil Holland emailed this week to say that issue 48 of VIEWS News has gone up on the VIEWS web site, including a tip from RTSL.eu.

Phil does a great job with the newsletter and it always contains a plethora of handy hints and tips. The diary of upcoming SAS-related events is most useful too. Back-issues of the newsletter can be viewed from the SAScommunity web site.

Wednesday, 25 November 2009

NOTE: EGP is a ZIP

I just discovered that Enterprise Guide's EGP files are actually zip files. You can open them with appropriate unzip utilities and extract your code, log and listing files.

POLL RESULT: What is your preferred method for accessing the NOTE: blog?

We recently ran a poll on the blog to understand your preferred method for accessing the information contained hereabouts. The ability to vote was open to all. A mere 27 votes were cast, not a sufficient number to draw significant conclusions from (and it was open to mis-use by voting more than once). However, the results are shown below and demonstrate a large apparent preference for access by RSS feed.

Visiting the web site:   1 ( 3%)
RSS reader:             16 (59%)
Email subscription:      4 (14%)
Twitter:                 4 (14%)
Other:                   2 ( 7%)

I'm intrigued by the 2 votes for "other". What other access channels are there? Please do post a comment with your suggestions.

NOTE: used to be published as an email and was sent (free) to over 4,000 subscribers. In converting to a blog we had assuemd that there would be a preference to receive new posts via email. It seems we may have been wrong. We will use the information to reconsider our posting pattern. We currently post just twice a week in order to keep down the volume of emails. If you have any posting preferences, please let us know in a comment.

Monday, 23 November 2009

NOTE: More on CAT Functions

Further to last week's post on using the CAT functions, Jack Hamilton emailed me with some further advice:

There are also problems with the CAT functions in PROC SQL. I don't remember the exact details, but I had something like this:

select catx(', ', a, b, c, d, e, f, g, h, i)
into :macrovar

If the result string exceeded a certain size, it was truncated.

In this particular case I went back to the concatenation operator.

In general, functions that return a string value can look at their destination to decide how long the result can be. In this case, there is no predefined destination length, and the function arbitrarily chose a length that was too small.
Good advice. There's no reason to avoid the CAT functions, but they do need to be used with the same degree of consideration as other string functions and you need to pay attention to lengths.

Thanks Jack

Round the World: Race 3 Ends - SAS Consultant's Yacht Leads

Team Finland, with UK-based SAS consultant Andy Philips onboard, finished second in the South America to Africa leg of the round-the-world race. Having won the first two races, their second place in the third race means they sit at the top of the leaderboard as they fettle Team Finland for the fourth race - Africa to Australia.

Race three saw some interesting variations in tactics, some very heavy weather, plus a scary man-overboard rescue on Hull & Humber.

Race four sees the crews face yet tougher seas - "should be interesting!" said Andy in his email to me last week, having arrived into port the previous night. "Loving this hot water, beer, and a floor that isn't heeled over at 30 degrees!"

Good luck, Andy!