Thursday 28 October 2010

NOTE: DATA Step's _N_ is PROC SQL's Monotonic

I've made a few posts over the last few months about SQL (particularly its use within Enterprise Guide). One of my unstated issues with SQL is the inability to do the equivalent of DATA step's _N_ to count unique rows. The ability to add a sequence number to a table, or as a contribution to a calculated column (perhaps to create a unique key), is something that I occasionally need.

I recently found PROC SQL's monotonic function in the Tips and Techniques section of the Amadeus web site. It's not documented or supported, so it should be used with caution.
proc sql;
  create table notecolon as
    select * , monotonic() as counter
    from mylib.mytable;
run;
SAS Usage Note 15138 says that the monotonic function is not supported in PROC SQL and that using the monotonic function in PROC SQL can cause missing or non-sequential values to be returned.

In the right circumstances, the monotonic function can be useful. And Amadeus have plenty more useful tips and techniques to peruse and use!

Wednesday 27 October 2010

NOTE: SAS TALKS Celebrates One Year

At the same time as we mark the passing of Benoît Mandelbrot, SAS TALKS is celebrating its first birthday.

SAS TALKS is a series of free webinars offered by SAS and featuring SAS technical experts. You can watch any of the last year's ten webinars from the SAS TALKS archive. The eleventh and most recent ("Feeding Output from PROC SUMMARY into a SAS9 OLAP Cube" by Michelle Buchecker from SAS) will be available soon.

These webinars are usually presented very well and contain useful, insightful information. If you attend the live versions you can ask questions too.

The next live webinar to watch will be "That's Not What I Meant: Adventures in Defining New Variables" by A. John Bailer from Miami University. This will be broadcast on Thursday November 11th and you can sign-up on SAS TALKS' Featured page. John will shows tips and tricks for correctly constructing and recoding variables, plus pits and traps to avoid.

In December, SAS's Angela Hall (authoress of the SAS-BI blog) will talk about "SAS Web Report Studio Performance Improvement".

Benoît Mandelbrot (1924 – 2010)

Mandelbrot zoom
I was saddened to see reports of the recent death of Benoît Mandelbrot. I'm not clever enough to understand much of his work, but I do understand that his pioneering work on fractals (fractal geometry) led not only to some beautiful images but also to highly significant computer science uses in image compression and in image generation within video games.

Friday 22 October 2010

NOTE: Just 3 Days to Submit Your SGF 2011 Paper!

I just finished fettling my SAS Global Forum 2011 paper submission and I have submitted it. My fingers are crossed.

The deadline for papers is Monday October 25th. Have you submitted yours yet?

Tuesday 19 October 2010

UN World Statistics Day

We're told that SAS no longer stands for Statistical Analysis System, but it's still used for an awful lot of statistical analysis. So, it's worth noting that tomorrow (Wednesday October 20th) has been named World Statistics Day (WSD) by the United Nations. The UN's intent is to raise awareness of the importance of official statistics and the contribution of statisticians to our society.

The day will be celebrated worldwide. In the UK, the Office for National Statistics will be working hard to use national and regional media to publicise the work of statisticians, and the Royal Statistical Society (RSS) is planning to launch a ten-year campaign to improve statistical literacy.

Jim Goodnight, SAS CEO, has published a YouTube video to commemorate the day. And you can follow WSD on Twitter: @WorldStatDay

NOTE: More on sorting and SQL

Further to my recent post on telling SAS that your data is pre-sorted, it's worth mentioning the SORTVALIDATE system option. This yes/no option tells SAS whether the SORT procedure should verify if a data set is sorted according to the variables in the BY statement when SORTEDBY has been used. There's an entertaining SAS Global Forum paper on this general subject entitled "Dear Miss SAS/Answers: A Guide to Sorting Your Data".

It's a subject worth being on top of because it can save a great deal of time and resource for your jobs.

Finally, if you're using the SORTEDBY option to avoid sorting being done within SQL joins, remember that you can you use PROC SQL's _method option to show what sorting activity is or isn't being done. Kirk Lafler presented a good paper on this undocumented feature at SAS Global Forum 2008.

NOTE: What is PROC SORTT?

My friend and ex-colleague CB drew my attention to an intriguing thread on SAS-L recently. It asked "What is PROC SORTT" and drew a response from Paul Kent

At times we've issued "the same proc with a new name" as a way of allowing us to get feedback on extensions to that procedure, without completely putting everyones production workload at risk. In the past we've seen TREPORT - new features in proc report; TFREQ - new statistics in proc freq, and SORTT - the threaded implementation... Sometimes we can prototype new features simply with new options in the existing PROC. It's usually a judgement call related to how invasive the new feature is to the structure of the production code.

We want power users to kick the tyres (for my UK friends) on some new features or implementation before they become permanent... So they remain undocumented (mostly) because the features are "transient" in the sense that they migrate into the "official" versions of the procedures, or they were not such good ideas to start with. And they remain implemented (mostly) so as not to overly in-convenience any early adopters who have been so bold as to use them in production jobs... Hope that helps y'all understand our motivation (at least a little bit)

I've not found a use for this information yet, but I thought I'd share it with you nonetheless!