Tuesday 30 October 2012

NOTE: Seasonal Cycles and Date Increments (once more unto the breach)

I've posted a couple of articles recently about the INTNX and INTCK functions for dealing with date/time/datetime manipulation. Whilst researching these I stumbled across some related functions that I'd not heard of before. These functions manipulate date/time/datetime values around seasonal cycles (i.e. days within weeks, months within years, etc.) and provide other useful facilities.

Seasonal Cycles

INTSEASReturns the length of the seasonal cycle when a date, time, or datetime interval is specified.
INTCINDEXReturns the cycle index when a date, time, or datetime interval and value are specified.
INTCYCLEReturns the date, time, or datetime interval at the next higher seasonal cycle when a date, time, or datetime interval is specified.
INTINDEXReturns the seasonal index when a date, time, or datetime interval and value are specified.

Other Useful Date Functions

INTFITReturns a time interval that is aligned between two dates.
INTFMTReturns a recommended SAS format when a date, time, or datetime interval is specified.
INTGETReturns a time interval based on three date or datetime values.
INTSHIFTReturns the shift interval that corresponds to the base interval.
INTTESTReturns 1 if a time interval is valid, and returns 0 if a time interval is invalid.

They're the kind of functions that are life-savers to some, and of little or no interest to others. If you're in the former category and you've not heard of them before, then I'm glad to have been of service!...

Monday 29 October 2012

NOTE: More on Date Increments (INTCK and INTNX)

It's always encouraging to get feedback about my blog articles and/or see an article spark some conversation. Last week's Date Increments (INTNX and INTCK) featured the INTNX function for incrementing (or decrementing) date, time, and datetime values by specific intervals such as HOUR and MONTH. I highlighted the optional fourth parameter that can be used to specify where in the interval the result should lie. The article created a small flurry of tweets from @DavidKBooth and @LaurieFleming, including:

@LaurieFleming @aratcliffeuk just discovered intck('year', birthday, date, 'C') which correctly calculates age in years!

@LaurieFleming @aratcliffeuk it assumes people born on 29feb celebrate birthdays on 28feb in non leap years.

@DavidKBooth @aratcliffeuk Excellent. That's much so better than floor((intck('month', &birth, &date) - (day(&date) < day(&birth))) / 12)

@LaurieFleming @aratcliffeuk 4th parameter added in 9.2 - would have been cross if I'd been overlooking it for ages.
To be perfectly honest, I've used INTNX a tremendous amount, but strangely I've never used INTCK half as much. I hadn't even realised that INTCK's fourth parameter took different values to INTNX. The valid values are DISCRETE (default) and CONTINUOUS. DISCRETE counts the number of interval boundaries between the two dates; CONTINUOUS counts the number of intervals, starting from the start date, thus it is well suited to calculating (as Dave says) ages. So, today was another learning day for me!

Saturday 27 October 2012

Supporting the 2012 Poppy Appeal

You may have noticed that our twitter avatar has started to sport a poppy this week. This is in support of the Royal British Legion's 2012 poppy appeal - commemorating those who have lost their lives in defence of their country, and providing financial, social and emotional support to those who have served in the Armed Forces, and their dependants. 

We added the poppy (temporarily) using Twibbon. It's a great way to show your support, and optionally make a donation.

You can even donate via your mobile. Just text the word POPPY to 70800 to make a donation to The Royal British Legion. Texts cost £10 plus standard network charges (£9.92 goes to the Poppy Appeal).

Twitter, Fixed

I'd noticed for a few weeks that the Twitter widget in the right-hand margin on the blog wasn't working - it was showing a link to my Twitter page, but no tweets.

Having just done some research, it seems this was down to some change at the Twitter end and many people were suffering the same issues.

I've now replaced the Twitter widget that I'd got from Blogger with a widget that I've just got from Twitter and all now seems to be fine. Please let me know if you continue to see problems.


Wednesday 24 October 2012

NOTE: Date Increments (INTNX and INTCK)

If you're an experienced SAS practitioner you'll be familiar with the INTNX (and INTCK) function. INTNX takes date, time, and datetime values and increments (or decrements) them by a specified interval, e.g. hours, weeks or quarters. If you're not familiar with the function, I'll give a quick introduction; if you've been using INTNX for years, I'll highlight the SAMEDAY alignment option introduced in SAS V9. V9 was introduced some time ago, but if you were already familiar with INTNX then you may have over-looked the new SAMEDAY option.

By way of a basic introduction, let's assume we have a variable named START that contains a date value, and we want to add three months to it. If we knew the number of days in three months we could do a simple mathematical addition, but the number of days in any three month period can vary. However, help is at hand with INTNX. The following illustrates the solution.

data demo;
  format start end date11.;
  start = '1jun2012'd;
  end = INTNX('MONTH',start,3);
  put start= end=

START=01-JUN-2012 END=01-SEP-2012

As you can see, the date has been incremented by three months. The first parameter of the INTNX function specifies the type of interval, and the third specifies how many intervals (negative values are permissible and result in the value being decremented). The SAS 9.3 Functions and CALL Routines: Reference lists the valid values for the interval; there are many.

There is one feature you need to be aware of. The value will be incremented (or decremented) to align with the beginning of the interval, so INTNX('MONTH', '10jun2012'd, 3) would also result in '1sep2012'd, not the 10th.

There's a fourth parameter of the INTNX function that allows you to specify the alignment as BEGINNING (the default), MIDDLE, and END.

So far, so good, but (perhaps unknown to experienced SAS programmers) V9 introduced a fourth alignment value: SAME.

Armed with this knowledge, we can increment 10th June and get a result of 10th September: INTNX('MONTH', '10jun2012'd, 3, 'SAME').

And finally, INTNX has two optional adjuncts to the interval. Firstly, the interval can be suffixed with a number to indicate larger intervals, e.g. MONTH2 indicates that intervals that are two months in length should be used. Secondly, another numeric value can follow a dot and indicates an offset starting point, e.g. YEAR.3 specifies yearly periods shifted to start on the first of March of each year, and YEAR2.6 indicates that the intervals are each two years in length and that they start on the sixth month. There's more detail on these optional parameters in the aforementioned SAS 9.3 Functions and CALL Routines: Reference manual

Anf finally, finally, I mentioned INTCK earlier. The INTCK function calculates the number of intervals between two specified date, time or datetime values. It uses the same intervals and general style of syntax as INTNX. If you use dates, times, or datetimes, then you need to be good friends with INTNX and INTCK!

Tuesday 23 October 2012

Technical Debt

Last week I mentioned a term that was new to me (Mutation Testing) and so I thought I'd mention another recently acquired term - Technical Debt. In this case I was familiar with the concept, but I hadn't heard the term before. I think the term very succinctly describes the concept.

We're all familiar with the fact that the software that we build isn't perfect. I don't mean it's full of bugs, I mean that there are things we could have done in a more robust or long-lasting manner if we'd had the time or the money. It could be code or it could be architecture. This is our technical debt - things that are an effective and appropriate tactical and short-term choice but which we should put right in the longer-term in order to avoid specific risks or increasing costs (the interest on the debt).

Examples of technical debt include:
  • Incomplete error trapping, e.g. we know that the code will bomb in certain circumstances and will not offer the user any message to explain why it bombed and what they need to do to avoid it, e.g. the supplied data was of the wrong format. As a tactic to get the code out of the door, this is sometimes necessary
  • Hard-coding a series of values rather than placing them in a control file and giving the appropriate people the ability to edit the control file. Again, as a tactic to get the code out of the door, this is sometimes necessary
  • Coding-up a routine that is known to be part of the base software in the next version of the base software. This may be necessary as a short-term measure because the upgrade to the next version of the base software is a significant project in itself
  • Attaching a barely sufficient amount of temporary storage 
  • Using a non-strategic means of getting source data into your ETL process
  • Delivering an early release of software that doesn't fully meet all requirements
Whatever form your own technical debt takes, it is important that you maintain a register of it and that you manage it.

As in our personal lives,debt is not necessarily a bad thing. It allows us to buy a house and/or a car that would otherwise be our of reach. The key thing is to recognise that one has the debt and to manage it - which is not necessarily the same thing is removing the debt.

Release cycles can make a considerable difference in the rate of acquisition and disposal of technical debt. Releasing early and often makes it much easier to take on technical debt but also makes it easier to resolve that debt. When well-managed, this can be a blessing - taking on debt earlier allows you to release more functionality earlier, allowing immediate feedback from customers, resulting in a product that is more responsive to user needs. If that debt is not paid off promptly, however, it also compounds more quickly, and the system can bog down at a truly frightening rate.

Shortcuts that save money or speed up progress today at the risk of potentially costing money or slowing down progress in the (usually unclear) future are technical debt. It is inevitable, and can even be a good thing as long as it is managed properly, but this can be tricky: technical debt comes from a multitude of causes, often has difficult-to-predict effects, and usually involves a gamble about what will happen in the future. Much of managing technical debt is the same as risk management, and similar techniques can be applied. If technical debt isn't managed, then it will tend to build up over time, possibly until a crisis results.

The term "technical debt" was coined by Ward Cunningham in his 1992 OOPSLA paper The WyCash Portfolio Management System.

Technical debt can be viewed in many ways and can be caused by all levels of an organization. It can be managed properly only with assistance and understanding at all levels. Of particular importance is helping non-technical parties understand the costs that can arise from mismanaging that debt.

Aside from reading Ward's 1992 paper, you can find plenty more valuable sources of information on this topic. Here are just a few that I recommend:

Take good care of your debt and it will take good care of you. The reverse also holds!

Monday 22 October 2012

NOTE: Always Striving to Learn More

Aren't SAS users groups and conferences great? We all strive to continue learning, and we can do that a piece at a time through subscription to blogs and newsletters, and we can get great gulps of new knowledge from attending SAS users groups and conferences. If you don't have a convenient local users group (or your employer refuses to let you attend) then you have my sympathy, but all is not lost. The SAS Users Groups blog is run by a great team from SAS and provides highlights from users groups meetings in US.

This month was the turn of the South East SAS User Group (SESUG). Judging by the highlights presented in the SAS Users Groups blog it was clearly a good conference. Two blog articles (linking to conference papers) particularly caught my eye:

Why Use Hadoop?

Macro Headaches. Learn How to Prevent Them

And the Best Contributed papers at Mid West SAS Users Group (MWSUG) offers plenty of quality reading too.

So, even if you don't go to any of the conferences, you have plenty of opportunity to benefit from the presented material. What would you like to know more about?...

Wednesday 17 October 2012

Mutation Testing

I've published a number of articles on testing in the past, and I thought I had a decent awareness and knowledge of the subject. But, as they say, every day is a learning day and I came across a new testing technique recently: Mutation Testing.

I've not yet tried this technique myself, but it certainly seems to offer benefits, and it's a useful extra consideration when you're creating your testing strategy for your project. Mutation Testing forms a test of your tests and so it is not of value in all cases.

In a nutshell, mutation testing involves creating multiple copies of your code, introducing small changes into each copy - with the deliberate intention of breaking the code in some small way - and then running your tests. If your suite of tests is of sufficient quality then each mutant copy of your code will fail at least one of your tests. If not then you need to enhance your tests so that all mutants fail at least one test.

The types of mutations can vary but they typically include 1) negation of logic operators, 2) setting values to zero, 3) use of wrong variable names. The general nature of the mutations is to emulate common programming errors.

Wikipedia tells us that "Mutation testing was originally proposed by Richard Lipton as a student in 1971,[2] and first developed and published by DeMillo, Lipton and Sayward. The first implementation of a mutation testing tool was by Timothy Budd as part of his PhD work (titled Mutation Analysis) in 1980 from Yale University."

As I said earlier, Mutation Testing is not of benefit in all cases The exercise of testing is about engendering confidence, not offering cast iron guarantees. As a technique to engender greater confidence, Mutation Testing is certainly of value. However, not all projects will require the degree of confidence that Mutation Testing brings. For some projects, the cost versus confidence balance will be tipped before Mutation Testing becomes appropriate.

Nonetheless, for those projects where a high degree of confidence is required, Mutation Testing certainly has a role to play.

Have you used Mutation Testing? Please let me know (through a comment) if you have, I'm keen to hear some experiences good or bad)

Tuesday 16 October 2012

NOTE: Custom Tasks, Beyond the Basics with Enterprise Guide

Stretching my Beyond the Basics of EG series by one more posting... The ability to create your own (custom) tasks in Enterprise Guide certainly takes EG beyond the basics. I've previously mentioned my disappointment (here and here) at the lack of 3rd-party custom tasks. One of the undoubted reasons is a perceived difficulty in coding with Microsoft .NET rather than any of SAS's own languages.

That may be about to change with the imminent release of Chris Hemedinger's Creating Custom Tasks for SAS Enterprise Guide using Microsoft .NET from SAS Press. In a recent blog post, Chris mentioned that his book will become available in early 2013. It will be a welcome addition to the SAS Press portfolio.

It's easy to see why custom tasks must be written in a non-SAS language: EG runs on Windows PCs, and those PCs will not have a copy of SAS on them if EG is working in client server mode and SAS is on a server. So the custom task author needs to use a language that is universally available and has a good understanding of Windows and its APIs. Thus, .NET is a sensible choice for SAS to have made.

However, it's not a programming skill that is available to many SAS teams. At best it tends to be a skill that one member of the team my have picked up. In these circumstances, support for custom tasks can easily become dependant upon one key person. That's not a good situation. I wrote about Bus Factors for software projects last year; the same applies to support teams.

If you're lucky enough to have two or more team members with .NET skills, or you're able to get support from a team alongside the SAS team, then custom tasks can add a great deal of value to your use of SAS and Enterprise Guide. Perhaps it's worth one or two of your team members investing some time into learning .NET?...

Tuesday 9 October 2012

NOTE: Log Searching, Beyond the Basics With Enterprise Guide

I originally intended my Enterprise Guide "Beyond the Basics" series to have just two parts, but I just had to include two recent tips from Angela Hall. In "Always learning something new, two awesome Enterprise Guide tricks", Angela relates two tricks she learned (or was reminded of) at this year's WUSS (Western Users of SAS Software) conference. Angela's first tip is a means of quickly skipping through your SAS log to find errors and warnings; the second describes how you can organise your EG projects with multiple Process Flows.

Angela's first tip is almost embarrassing. She merely points out that the Log window has its own toolbar, and there are Up and Down Arrow buttons on the toolbar which take one to the next or previous warning/error. Why didn't I know that? Why had I never noticed the toolbar or buttons?! My embarrassed is mollified by the fact that Angela admits that she didn't know either.

Angela's second tip is to point out that you can add extra Process Flows to your EG project; you can rename them to something meaningful; and you can then easily run selected parts of your project. Whilst I agree with Angela that this is a good way to organise your project, I have to point out a small disappointment that I've always harboured with regard to multiple Process Flows. Wouldn't it be nice if there was some support for links between Flows? I'd like to see two things:

1) A stepping stone icon that, when clicked, takes the user to another Process Flow. This would allow a user to follow the flow of data between Process Flows

2) When data nodes are shown on a Process Flow, EG should indicate which Process Flow created them. Again, this would help users understand the flow between Flows

Of course, even without my suggestions, Angela's tips are valuable and worth trying. Thanks Angela.

Tuesday 2 October 2012

NOTE: Conditional Processing, Beyond the Basics With Enterprise Guide

In my preceding Enterprise Guide (EG) "Beyond the Basics" article I described Prompts, their uses, and their benefits. Another valuable feature of EG is conditional Processing. This gives you the ability to take different paths through your project based upon the value of macro variables, prompts, data set contents, and a number of other values.

EG conditions can be associated with executable tasks and programs and are evaluated prior to execution of the task, query or program. Conditions decide which tasks, queries and programs should or should not be executed. Conditions provide a kind-of IF-THEN-ELSE capability in EG.

Once you've added a condition to your process flow you'll notice a few extra indicators appear on your nodes. A small flag indicates that the node has a condition associated with it; a small number indicates the specific condition associated with the specific node (one condition can be associated with more than one node; you'll see the same number of the nodes); a tick or a cross will show whether the condition ran at run-time.

You can make your conditions quite complex because you are allowed to create chains of ELSE-IF clauses within your conditions.

All of this functionality, introduced in EG v4.2, means you can make your EG project produce different reports, send output to different email recipients, and much more.

Curiously, there doesn't seem to be a great deal of talk about EG conditions in conference papers and blogs. SAS Knowledge Base article 39995 offers some useful insight. Chris Hemedinger's SAS Global Forum (SGF) paper from 2008, Find Out What You're Missing: Programming with SAS Enterprise Guide, offers an example. However, Chris Schacherer's excellent SGF paper this year, Take a Fresh Look at SAS Enterprise Guide: From point-and-click ad hocs to robust enterprise solutions, takes a detailed look at many EG features and provides a detailed step-by-step solution to a fictitious reporting scenario, including the use of a condition.

Having extolled the virtues of conditional processing (and, there are many virtues), I need to add a word of caution. If you export the code from the project, the conditional logic will not be included in the export. This is perhaps understandable because conditions are evaluated within EG (on the client) at run-time; they are not implemented in macro code as you might have otherwise supposed. However, the biggest crime here is that EG doesn't warn you that it won't include your conditional logic when you perform an export. You have been warned!