NOTE: Development With SAS: SQL

Showing posts with label SQL. Show all posts

Wednesday, 28 November 2012

NOTE: ODBC Performance

SAS is great at getting data from a huge variety of sources. With its SAS/ACCESS connectors, SAS can get data from these sources in an optimised fashion, i.e. using the native capabilities of the source data provider. In these days of increasingly large set of data, optimisation and efficiency are crucial factors. Steve Overton wrote an article in the Business Intelligence Notes blog earleir this month in which he offered some experience and advice with regard to gatrhering data into SAS via ODBC.

ODBC is a generic means of accessing a large variety of different data sources. As such, it's less easy to optimise the process, but nonetheless we want the data to flow at the best possible rate. In SAS Administration: Fetch Big Data Faster Across ODBC, Steve describes how judicious use of the FETCH and USEDECLAREFETCH parameters in the ODBC.INI file can make big differences to the speed of your data access. Valuable.

Wednesday, 31 August 2011

NOTE: CALCULATED in SQL

If you use SQL, either hand-cranked from the Editor, or generated via Enterprise Guide or DI Studio, you're probably familiar with creating new columns ("computed columns"). However, have you ever struggled to create a computed column from a computed column?

You can do this easily in SQL, but the syntax isn't immediately obvious. For example, calculating body mass index could be attempted with a two-stage calculation:

18 /* Assume height measured in metres, weight measured in kilos, */
19 /* then BMI = weight / height^2 */
20 proc sql;
21   create table alpha as
22     select class.*
23            ,height * height as heightSquared
24            ,weight / heightSquared as bmi
25     from sashelp.class;
ERROR: The following columns were not found in the contributing tables: heightSquared.

We can create a "heightSquared" computed column, but trying to use heightSquared to create bmi fails.

The solution is simple: just add the keyword "calculated" in front of the computed column when you want to refer to it. Here's the corrected select expression:

     select class.*
            ,height * height as heightSquared
            ,weight / CALCULATED heightSquared as bmi

You can use the CALCULATED keyword in all places where you can use SQL in SAS.

Monday, 10 January 2011

NOTE: Using the ANY and ALL Operators in PROC SQL

Let me put my cards on the table before we start: I had never heard of the ANY and ALL operators in PROC SQL before I spotted David Shannon's tweet this evening. The ANY and ALL operators are PROC SQL operators which can be used to compare the result of a query expression against either ANY or ALL of the results returned from another query. Read the Amadeus tips entry to discover the details! It was an eye-opener for me.

Amadeus are one of the UK's leading SAS consultancies (alongside RTSL!), and David is Amadeus's Technical Director. The UK SAS consultancy business is a small world, so I've known David for many years. However, despite knowing David quite well, I was surprised and amused (in equal measure) to see that he is contemplating the infamous Land's End to John O'Groats journey (LEJOG - a 1,000+ mile journey from the most extreme southwestward tip of Britain in Cornwall in England to the most extreme northeastern tip in Scotland) in a 1970's British sports car, without using major roads. The man is off his trolley! You can follow David's planning and trip on his blog.

My wife completed LEJOG on a pedal bike in 10 days last year. Let's see if the best of the British motor industry of the 70's can get there faster... Both journeys were / will be in aid of good charities...

Thursday, 28 October 2010

NOTE: DATA Step's _N_ is PROC SQL's Monotonic

I've made a few posts over the last few months about SQL (particularly its use within Enterprise Guide). One of my unstated issues with SQL is the inability to do the equivalent of DATA step's _N_ to count unique rows. The ability to add a sequence number to a table, or as a contribution to a calculated column (perhaps to create a unique key), is something that I occasionally need.

I recently found PROC SQL's monotonic function in the Tips and Techniques section of the Amadeus web site. It's not documented or supported, so it should be used with caution.

proc sql;
  create table notecolon as
   select * , monotonic() as counter
   from mylib.mytable;
run;

SAS Usage Note 15138 says that the monotonic function is not supported in PROC SQL and that using the monotonic function in PROC SQL can cause missing or non-sequential values to be returned.

In the right circumstances, the monotonic function can be useful. And Amadeus have plenty more useful tips and techniques to peruse and use!

Tuesday, 19 October 2010

NOTE: More on sorting and SQL

Further to my recent post on telling SAS that your data is pre-sorted, it's worth mentioning the SORTVALIDATE system option. This yes/no option tells SAS whether the SORT procedure should verify if a data set is sorted according to the variables in the BY statement when SORTEDBY has been used. There's an entertaining SAS Global Forum paper on this general subject entitled "Dear Miss SAS/Answers: A Guide to Sorting Your Data".

It's a subject worth being on top of because it can save a great deal of time and resource for your jobs.

Finally, if you're using the SORTEDBY option to avoid sorting being done within SQL joins, remember that you can you use PROC SQL's _method option to show what sorting activity is or isn't being done. Kirk Lafler presented a good paper on this undocumented feature at SAS Global Forum 2008.

Wednesday, 8 September 2010

NOTE: Sort Your Sorted Data

In my previous post I spoke of using views to give you access in EG visual coding to tables with dynamic names. This works well, but there's one final observation to make if you want to avoid poor performance in some circumstances.

If you're used to using DATA step merges then you'll also be used to sorting your input data sets prior to the merge. If you're using EG visual coding then you'll have realised that you don't need to sort the data sets that you use as input to your joins. Why? Because SQL will automatically sort your input data sets for you.

Neat huh? Well, yes it is, except in cases where your input data is already properly sorted and re-sorting is a waste of resource and causes unnecessary delays in the execution of your job.

So, can we tell SQL that one or more input data sets are already sorted? Yes, we can, by using the SORTEDBY data set option.

Continued (read more)...

Monday, 6 September 2010

NOTE: Marking Time with Enterprise Guide

In my recent review of a client's SAS/Base jobs' suitability for EG visual coding, I wasn't surprised to come across some jobs that created new tables each month where the table names included the date. The table names were of the form MASTER.SALES_2010_08, indicating sales for August 2010. However, the fact that the table was referenced with the use of a macro variable, i.e. MASTER.SALES_&month., meant that EG was less than happy.

This style of table naming doesn't sit well with EG's visual coding (nor DI Studio) because the process flow needs a static table name for inputs and outputs. You can use the Query Builder to create an output table whose names includes a macro variable. The code will execute and the table will be created, but it won't appear in the process flow. And you can't use a macro variable in the name of an input table for visual coding.

There are at least two approaches to take to this problem...

Continued (read more)...

Wednesday, 1 September 2010

NOTE: Reflections on Enterprise Guide (EG)

I've recently had the benefit of spending some "quality time" with Enterprise Guide V4.2 (and SAS V9.2). I've enjoyed migrating a selected sample of my client's SAS/Base programs into EG projects. The foibles of my client's code gave me a great opportunity to re-evaluate EG's visual-coding, and to inform my client of the benefits and challenges of adopting visual-coding wholesale.

Visual-coding is the term I use to describe the creation of programs by the visual connection of process nodes with data nodes using a drag-and-drop graphical user interface. A visually-coded program usually results in a flow chart-like diagram of the program's activity.

The first point I should make is that EG does not force you to adopt visual-coding. You can use EG's program node with its code editor and ignore visual-coding altogether. Compared with SAS's classic display manager interface, you'll lose the ability to use the interactive DATA step debugger and to view the content of SAS catalogs, you'll gain an integrated FTP client, seamless remote submission of your code, and you can use visual-coding for wizard-like code generation for thoser PROCs you're less familiar with.

My two primary observations about EG are:

1) It's not a visual ETL tool. If you want to generate ETL in a visual manner, get DI Studio. EG's in-built task nodes do not provide any means of updating your data. Your only output options are a) create a new data set, or b) completely over-write an existing data set. In some cases option (b) is not available within a single task (most notably in the APPEND task). In many cases EG's options will be sufficient, but the absence of SQL's INSERT and UPDATE statements, and DATA step's MODIFY and UPDATE statements, means that it may not be the best tool for developing visual ETL.

2) You need to become competent with SQL if you want to use EG's visual-coding in an effective manner. Most of EG's data manipulation tasks are based upon SQL. Particularly the Query Builder task. If you want to go beyond the basics you need to understand how to use SQL syntax such as DISTINCT, GROUP and CASE. And you need to understand what each of the different types of joins do. If you're an experienced SAS programmer you won't struggle to get to terms with the new syntax, but you do need to know it.

I'll offer some more specific findings in the next few posts. In the meantime, if you have any observations of your own, please provide a comment to this post and start a conversation. You can also make use of the EG discussion forum on the SAS Support web site; and learn lots from Chris Hemedinger's blog.

The small print:

Please send general comments to NoteEditor@RTSL.eu

The RTSL blog: © 2001-2015 Ratcliffe Technical Services Limited. All rights reserved.

5 Willow Close, Bexley, Kent, GB-DA5 1QY, England

Re-publication by permission only. You may forward copies only if no fee is involved. Please encourage the recipients of the forwarded copy to subscribe and get their own free copy

SAS® is a trademark of SAS Institute, Inc. All other trademarks, company names, products names and logos mentioned are the property of their respective owners.

NOTE: Development With SAS

Pages

Wednesday, 28 November 2012

NOTE: ODBC Performance

Wednesday, 31 August 2011

NOTE: CALCULATED in SQL

Monday, 10 January 2011

NOTE: Using the ANY and ALL Operators in PROC SQL

Thursday, 28 October 2010

NOTE: DATA Step's _N_ is PROC SQL's Monotonic

Tuesday, 19 October 2010

NOTE: More on sorting and SQL

Wednesday, 8 September 2010

NOTE: Sort Your Sorted Data

Monday, 6 September 2010

NOTE: Marking Time with Enterprise Guide

Wednesday, 1 September 2010

NOTE: Reflections on Enterprise Guide (EG)

You might also like these web sites...

You might also like these blogs...