Tuesday 1 November 2011

Avoid Magic Numbers

Mention of my university days last week reminded me of an old maxim that my professor drummed into me. Avoid magic numbers.

Have you ever had to amend a colleague's code and found a number in the middle of a calculation and wondered what the significance of the number was? Did it make you nervous about changing the code because you didn't fully understand it? Your colleague coded a "magic number".

The term "magic numbers" refers to the practice of including numbers in source code. Doing so is likely to leave very little explanation as to why the developer chose that number, and thus the program becomes difficult to confidently maintain. It is far preferable to declare numeric constants at the top of your program as macro variables. This has a number of advantages:

1) Any mis-typed reference to the numeric constant will be highlighted by the SAS macro processor or compiler as an uninitialised variable, whereas a mis-typed number can be very difficult to spot

2) The numeric constant can be defined once and then used many times throughout the program. Thus, if it needs to change, the change needs to be made just once, and there's no danger of missing one or more occurrences that also need to be changed

3) Placing the definition at the top of the code makes it very easy to add some comments around it that describe the rationale for the value, plus advice for making changes.

As you can see, there are distinct and compelling reasons to avoid magic numbers in your code. To illustrate the point, consider the following snippet of code:

data ada;
  set sashelp.buy;
  GrossPrice = amount * 1.2;

The code offers no indication/explanation that it is adding Value Added Tax (VAT) to the price of each item. We could re-code things this way:

%let VATrate = 0.20; /* VAT is currently 20% - 20OCT2011 */

data ada;
  set sashelp.buy;
  GrossPrice = amount * (1+&VATrate);

We have documented the meaning of the number, and used a descriptive name throughout our program. And, if we mis-type the name of the macro variable in our DATA step we'll be rewarded with an error message, whereas mis-typing 1.2 in our earlier code may have gone unnoticed.

Inevitably there are drawbacks to this approach (such as making the code more complex in some people's eyes, and slowing the compilation process by a fraction) but the benefits, in my opinion, far outweigh the drawbacks.

There are cases where it is acceptable to use numeric constants. The true and false values (1 and 0) are two such examples.

Making your code data-driven and thereby eliminating the use of numeric constants is better still. But that's a topic for another day!