Wednesday, 8 September 2010

NOTE: Sort Your Sorted Data

In my previous post I spoke of using views to give you access in EG visual coding to tables with dynamic names. This works well, but there's one final observation to make if you want to avoid poor performance in some circumstances.

If you're used to using DATA step merges then you'll also be used to sorting your input data sets prior to the merge. If you're using EG visual coding then you'll have realised that you don't need to sort the data sets that you use as input to your joins. Why? Because SQL will automatically sort your input data sets for you.

Neat huh? Well, yes it is, except in cases where your input data is already properly sorted and re-sorting is a waste of resource and causes unnecessary delays in the execution of your job.

So, can we tell SQL that one or more input data sets are already sorted? Yes, we can, by using the SORTEDBY data set option.

Monday, 6 September 2010

NOTE: Marking Time with Enterprise Guide

In my recent review of a client's SAS/Base jobs' suitability for EG visual coding, I wasn't surprised to come across some jobs that created new tables each month where the table names included the date. The table names were of the form MASTER.SALES_2010_08, indicating sales for August 2010. However, the fact that the table was referenced with the use of a macro variable, i.e. MASTER.SALES_&month., meant that EG was less than happy.

This style of table naming doesn't sit well with EG's visual coding (nor DI Studio) because the process flow needs a static table name for inputs and outputs. You can use the Query Builder to create an output table whose names includes a macro variable. The code will execute and the table will be created, but it won't appear in the process flow. And you can't use a macro variable in the name of an input table for visual coding.

There are at least two approaches to take to this problem...

Project Management, Churchill-Style

Having recently marked the 70th anniversary of the start of World War II's Battle of Britain, my interest was piqued by a book sub-titled "Project Lessons from Winston Churchill and Battle of Britain". The idea of viewing World War II as a project surprised me. But the more you think about it, the more you see that, in detail, the requirements of executing a war are no different to any other project - you need leadership and governance, and you have to manage resources and tasks whilst keeping your eye on delivering the project's objectives.

Clearly, Winston Churchill's project had a higher cost of failure than any project I'll ever manage, and the risks and tasks are of a different nature to those I'm familiar with, but I can see the similarities too. So, the book's on order and I'm looking forward to reading it.

"Agile Leadership and Management of Change - Project Lessons from Winston Churchill and Battle of Britain" was written by Mark Kozak-Holland as part of the Lessons from History series.

Saturday, 4 September 2010

NOTE: Celebrating Our 1st Birthday

Though I did a couple of test posts in July 2009, NOTE: really got started with its first post on Friday September 4th 2009. So, today we celebrate our first birthday.

In that year we've built a loyal band of followers. Thank you to all of you.

NOTE: is accessible in many different ways:

Just under 200 people use an RSS reader to get their SAS fix.
100 people get an automatic email copy of posts on any day in which an article is published.
73 people follow us on Twitter and get a near-instant tweet every time a new posts appears on NOTE:.
We get an average of 12 visitors to the web site each day, and they've created over 8,000 page views in the last year

I'm glad you find the content interesting. Please help us to grow further by telling your fellow SAS practitioners how informative NOTE: is, and by telling us what additional topics you'd like to read about.

Cheers!

Wednesday, 1 September 2010

NOTE: Reflecting on EG, Here's 4.3

I knew that EG V4.3 was due soon, but I thought I'd have time to post some EG V4.2 observations before V4.3 saw the light of day. Wrong!

Chris Hemedinger highlighted yesterday that EG V4.3 is available to those who want it.

It has much to recommend it. Most notable to me was the emphasis on new features for users of the SAS program editor. It is heartening to see that SAS recognises that visual-coding does not (yet) satisfy the needs of all SAS programmers. Thanks SAS!

See the What's New section of the SAS V9.2 documentation for the low-down on the new features.

NOTE: Reflections on Enterprise Guide (EG)

I've recently had the benefit of spending some "quality time" with Enterprise Guide V4.2 (and SAS V9.2). I've enjoyed migrating a selected sample of my client's SAS/Base programs into EG projects. The foibles of my client's code gave me a great opportunity to re-evaluate EG's visual-coding, and to inform my client of the benefits and challenges of adopting visual-coding wholesale.

Visual-coding is the term I use to describe the creation of programs by the visual connection of process nodes with data nodes using a drag-and-drop graphical user interface. A visually-coded program usually results in a flow chart-like diagram of the program's activity.

The first point I should make is that EG does not force you to adopt visual-coding. You can use EG's program node with its code editor and ignore visual-coding altogether. Compared with SAS's classic display manager interface, you'll lose the ability to use the interactive DATA step debugger and to view the content of SAS catalogs, you'll gain an integrated FTP client, seamless remote submission of your code, and you can use visual-coding for wizard-like code generation for thoser PROCs you're less familiar with.

My two primary observations about EG are:

1) It's not a visual ETL tool. If you want to generate ETL in a visual manner, get DI Studio. EG's in-built task nodes do not provide any means of updating your data. Your only output options are a) create a new data set, or b) completely over-write an existing data set. In some cases option (b) is not available within a single task (most notably in the APPEND task). In many cases EG's options will be sufficient, but the absence of SQL's INSERT and UPDATE statements, and DATA step's MODIFY and UPDATE statements, means that it may not be the best tool for developing visual ETL.

2) You need to become competent with SQL if you want to use EG's visual-coding in an effective manner. Most of EG's data manipulation tasks are based upon SQL. Particularly the Query Builder task. If you want to go beyond the basics you need to understand how to use SQL syntax such as DISTINCT, GROUP and CASE. And you need to understand what each of the different types of joins do. If you're an experienced SAS programmer you won't struggle to get to terms with the new syntax, but you do need to know it.

I'll offer some more specific findings in the next few posts. In the meantime, if you have any observations of your own, please provide a comment to this post and start a conversation. You can also make use of the EG discussion forum on the SAS Support web site; and learn lots from Chris Hemedinger's blog.