Wednesday 2 September 2015

NOTE: SAS "Inside" of Hadoop

We previously looked at SAS Grid Manager for Hadoop, which brings workload management, accelerated processing, and scheduling to a Hadoop environment. This was introduced with the m3 maintenance release of SAS v9.4. M3 also introduced support for using an Oozie scheduling server.

If you're keen to get additional SAS services running on your Hadoop cluster, potentially reducing "data miles", you'll be pleased to know that SAS has an experimental feature in v2.7 of the LASR Analytic Server that allows us to experimentally manage resources with YARN. I need to stress "experimental" - this is not ready for our production systems quite yet, unless reliability & availability are not our top priorities.

If the experimental status doesn't put you off then you can find more details at the back of the LASR Analytic Server 2.7: Reference Guide.

YARN (Yet Another Resource Negotiator) is part of the base framework of version 2 of Apache Hadoop. It's a resource manager and it takes care of the Hadoop cluster's compute resources. YARN can manage and share resources between various applications. Configuring LASR to participate in YARN's resource sharing allows YARN to have a complete picture of activities on the cluster.

The use of YARN with LASR is part of the increasing integration between SAS and Hadoop. I look forward to seeing it move to "general availability" status.

Tuesday 1 September 2015

NOTE: Visual Analytics V7.3 Has Arrived

Maintenance release 15w33 of SAS v9.4 arrived in August and it included a new version of Visual Analytics - version 7.3. This new release doesn't offer significant new features over its previous releases but I did notice that Visual Analytics Viewer's default appearance is now MODERN instead of CLASSIC. This is related to VA's transition from Flash to HTML5 technology - something I mentioned back in July.

I did also notice that VA v7.3 comes with an option to allow us to print the entire content of tables, crosstabs, gauges, and containers with content that is only partially available in the layout of the report section. I'm sure that'll be handy on occasions - especially for those of us with small screens!

Tuesday 25 August 2015

Making Notes With Livescribe 3

A couple of weeks ago I wrote about active listening, and I mentioned that I find that note-taking helps me understand (and remember) what I am being told. The downside of this approach is the pile of age-old notebooks that I don't want to throw away "just in case I need them". I think I recently found a solution to my problem...

Back in June I bought a Livescribe 3 digital pen. This is a conventional ink-based pen that writes notes in books, but its party trick is that it has a miniature infra-red camera built-in that observes my jottings and stores an electronic copy. The copy is stored inside the pen, but a Bluetooth link with my Android phone (or Apple iPhone) sees the digital images transferred to my phone (either in real-time, or next time I connect the pen and phone). The Livescribe+ app on my phone allows me to view the pages and books of all of my writings since I bought the pen in June. I can throw away the books I've used since June!

The Livescribe 3 is the latest in a series of smart pens from Livescribe. Like its predecessors, the Livescribe 3 needs the company's proprietary micro-dotted paper for it to do its magic, but the books are reasonably priced and widely available - or you can print your own micro-dotted paper if you wish (if your printer can manage 600dpi+). In addition, the pen uses Livescribe's proprietary ink refills (67mm length and 2.35 diameter).

Truth be told, the pen is a bit chunky, but I soon got used to its shape and I no longer notice it. This may, in no small part, be due to its light weight. All-in-all it feels very comfortable in my hand.

I've found the digitised images to be very accurate and readable facsimiles of my analogue scribbles. And the Livescribe+ app does a decent job of optical character recognition (OCR) of my words too - making the digital pages searchable.

If you're worried that your notes are a little vulnerable if they are only stored on your phone, fear not because the iOS version of the Livescribe+ app allows automatic synchronisation to Evernote and Microsoft OneNote. Apparently this feature will be added to the Android app soon.

All-in-all I'm smitten with my new pen. It accurately digitises and stores my scribble & writings, and it is nice and easy to use.

The Neo N2, which grew out of a Kickstarter campaign, is a direct competitor and gets good reviews. I honestly can't recall why I chose the Livescribe over the Neo. I think they're pretty similar in features and price.

Did I forget to mention the price of the Livescribe 3? Ah, this is its only major drawback. Expect to pay about £130.

Wednesday 12 August 2015

NOTE: SAS Global Forum 2016 - Call for Content is Open #SASGF

Have you thought about sharing some of your knowledge with fellow SAS practitioners at SAS Global Forum 2016 in Las Vegas, April 18 - 21? User participation is what makes SAS Global Forum (SGF). Attending the papers from SAS staff is always informative, but user contributed papers provide knowledge from real-word experiences and are equally valuable.

If you're interested, or unsure, take a look at the Call for Content section of the event web site. You don't have to be the world's most eloquent speaker, nor do you have to be a SAS guru. Forum attendees are always keen to pick up nuggets of information from fellow attendees.

The benefits to fellow attendees are obvious, but there are several benefits for speakers too. It's great for professional development; it will reflect well on you with your employers or clients; and you will be surprised how much you will learn as you go through the writing process. And if you're still not convinced, be assured that you can supported by a mentor too.

Go on, give it a try!...

Note how the event appears to have shifted one day later in the week than normal, i.e. in 2016 it will start on a Monday (pre-conference tutorials?) and then run from Tuesday to Thursday. And the favoured twitter hashtag no longer contains the year. Goodness, it's all change at SGF!!

Tuesday 11 August 2015

Good Communicators are Good Listeners

A couple of weeks back I wrote how great developers have skills beyond syntax and design patterns. One of those key skills is communication.

Communication is a two-way operation, and the art of listening is oft overlooked, so I thought I'd offer some notes on the subject here. Specifically, what is commonly known as "active listening".

There is a difference between simply hearing someone’s words and engaging & understanding what someone is saying. If we want to be an active listener, it isn't enough to just hear what someone is saying. Active listening means i) we are focused on what is being said, and ii) we are open to the speaker's point of view.

For the first part, paying attention, we should be listening and trying to understand what this means. There are different means of doing this, which suit different people. For me, I do this better by taking notes (plus, writing down what people say makes them feel heard and important!) and by using follow-up statements like:

  • Can I try to repeat back what you said, in my own words?...
  • So, another way of saying that is ..... Is that correct?

Each of these makes sure we understood the person's point of view and gives them confidence that we are doing so.

The second part of active listening is being receptive to other peoples' thoughts and ideas. Sometimes this can be really difficult for developers, or other smart people, because many times they are 5 steps ahead of everyone else! However, interrupting or assuming what someone else is going to say can hinder us from hearing new ideas.

It is very important to make people feel heard (not just listening to their words, but understanding their points).  To be more receptive to others here are some tips:

Don't interrupt.  This one can be hard, when we already know the answer to something, or have thought through the solution or path already. However, allowing people to say their thoughts shows respect and consideration for others.

Don't assume what they are going to say. We might be right, but we could also be wrong, and either way it prevents the other person from voicing their ideas. Problem solving, brainstorming, and planning are all things that can be better done collaboratively, and that means each person should have a chance to contribute.

Do ask clarifying questions. Try to understand the "why" behind what they are suggesting; even if the "how" is wrong, their motivations may be sound. Getting to the root of things by asking questions makes the other people feel heard out, ensures the roots of issues get addressed, and can lead to more productive conversations.

Acknowledge their view as important. They probably wouldn't have said anything if it didn't mean something to them. Therefore acknowledging their point, or motivation behind their point, makes the other person feel recognised and their concerns addressed. A simple statement like "that is a good thing to consider" or "thanks for bringing that up, we should definitely note it" is all that is needed; even if you plan to refute that view or disagree with it.

Wait for their response. If we ask someone a question show that we want to hear the answer by waiting for him or her to respond. Some people (especially technical folks) tend to like to think things through before they verbalise them. Other people are more comfortable when they are talking. If you fall into the latter camp, make sure that if you are talking to someone who likes to think through their reply, that you don’t keep talking and wait for their answer. Ask the question, pause and wait.

Notice their body language. Not all communication is said out loud, so take a few moments and try to observe how the person is reacting to what is transpiring. Are they guarded? Visibly upset or shaken? Are they excited and happy, nodding along to the statements? These are all clues that can help us understand what is really going on under the covers. Learning to pay attention to these can help us change our tone, or adjust our approach.

Listening (and contemplation) is such an important element of communication. We should always remember that we have two ears yet only one mouth; we should use them in proportion!

Tuesday 4 August 2015

Unix Command-Line Top 10

If you are introduced to command-line Unix (or Linux or any other Posix-compliant variant) after a lifetime of Windows, it can be a daunting experience. To those thrust into this position, I always offer my top "ten" most useful commands to begin getting to terms (and getting on top of) the new environment.

  1. "pwd" and "cd" - when sat at the command-line, you must imagine that you are sat "in" a directory ("folder" in Windows-speak). All of your commands will be aimed at the directory in which you are currently sitting. To aim a command at a different directory you must either "move" to that directory, or specify the other directory's name in the appropriate position within the parameters of your chosen command
    pwd - present working directory - tells you what directory you are currently sat in
    cd - change directory - move to another directory
  2. "ls" and "ls -l" - to see what files and subdirectories live within a directory, use either "ls" to see a simple list or use "ls -l" to get a detailed list that includes modify date, owner, and permissions
  3. "more" - if you want to see the contents of a file, more will do that for you. When you "more a file" you will see the first page of it and will then have the option of issuing further keyboard instructions:
    <space> - scroll down one page
    <return> - scroll down one line
    "q" - exit from more back to the Unix command line
  4. "grep" and "find" - use "grep" to search the content of files; use "find" to locate files with specific patterns of names
  5. "groups" - the "groups" command will tell you which security groups you are a member of
  6. "chmod" and "chown" and "chgrp" - these three commands respectively change the security permissions on a file, the individual owner of a file, and the group that owns a file (yes, in unix each file has two owners)
  7. "vi" - the much feared but ultimately powerful file editor
  8. "ps" and "top" - show what processes are running, and show the highest users of system resources
  9. ">" and "|" - Not strictly commands, but a most valuable adjunct to the use of commands, and one of the key features of Unix. The principle of Unix commands is to do one single task and to do it well. The principle of using Unix commands is to join the together by "piping" the output from one command to the next [to do]
  10. "man" - and if my brief explanations are not enough, view the pages of the manual with the "man" command
The foregoing is a vastly simplified description of the "ten" commands. Use the man command to find more information on each of them. The web is full of similar pages and documents with commands and how to use them, so there are plenty of sources for further information.

Are you an experienced Unix command-line warrior? What are your ten most frequently used commands?

Tuesday 28 July 2015

Great Developers Communicate!

There are so many skills that make a difference between a good developer (someone who knows their syntax and has a bagful of good design patterns) and a great developer. Many of those differentiating skills are related to communication.

I wrote about good communication skills nearly six years ago, so it's probably about time to revisit the subject!

Few if any of us work alone. The vast majority of us work as a team. Communication within the team is a necessity, not an option. As a developer, good communication will ensure you deliver the right thing and gain recognition for your abilities; as a manager, good communication is important if you and your team are to deliver what the business needs when the business needs it so that they can get maximum business benefit.

Many developers like working independently and are proud of the autonomy they may be given. But the paradox here is that the autonomy and independence can lead to a lack of recognition of their abilities. Many of us have worked damn hard only to see the boss give plaudits to somebody who may have contributed much less to the collective. Why does this happen, and how can we avoid it?

Surely there must be some hard numbers that will make it clear who's contributing the most to the team. Well, in my experience, traditional metrics such as lines of code, bugs closed and features added all have drawbacks in the reality of day-to-day software development, e.g. are two small features more valuable than one larger feature? And clearly, softer activities such as writing-for-maintainability and helping/coaching others have no reliable metrics for comparison. So, if there are no reliable metrics, how do you get noticed?

I think it comes down to trust. When a manager gives autonomy and independence to team members they are trusting them to complete the assigned task, make wise and strategic decisions along the way, and pro-actively communicate problems long before the become a problem. For someone to invest their trust in us, we have to show that we are in fact a good investment. We might ask ourself:

Does my boss trust me?
Do my team-mates and peers trust me?
Have I done a good job to earn their trust?
How would my peers describe me to someone else?
How influential am I within the organisation?

As a team member, we want to be judged by our contributions, and we want autonomy and the ability to own substantial things. As a manager, we want to give recognition and praise to the people who deserve it, but we don’t want to micromanage and spend our days being Big Brother. So there's an implicit contract: I will give you autonomy and independence, but it is your responsibility to share status and information with me.

For example, a team member once told me he had worked hard and really gave it his best, but from my viewpoint his progress wasn't up to the same level of his team-mates. When he was leaving the company he told me all the things he had done – and I asked him "Why didn't you share this with me before?" You see, I would have advised him to spend his time elsewhere on priorities that were more important to the business. He responded with "I thought you would know." Don't make that same mistake.

And so my conclusion in all of this is: if you want autonomy, and the ability to own and control your own domain and projects – it is your job to push information and build trust with your team members.

In other words, you need to learn and do the following:

Follow through. Do what you say and consistently deliver on your commitments.

Pro-actively communicate when a task takes you longer than you thought, and why.

Improve your communication skills. In order for others to hear you, sometimes you have to hone the way you deliver your message.

Volunteer information and make an effort to explain vague or hard to understand ideas and concepts. Make an effort to share the details of your decisions and diversions. This is also important when you make mistakes – letting others know before they figure out on their own will show ownership of the situation and can prevent misunderstandings later.

Be forthright and authentic with your feelings. Even when you may hold a contrary opinion communicate your thoughts (respectfully and with tact).

Don’t talk behind the backs of others. It is very difficult to build trust if someone knows that you will say something negative about your boss, the company leadership, or another team-member.

Be objective and neutral in difficult situations. Learn how to be calm under pressure and act as a diplomat resolving conflicts instead of causing them.

Show consistency in your behaviour. Not just in follow-through but by eliminating any double standards that may exist.

Learn to trust them. This is one of the hardest ones, but trust is a two-way street. Giving others the benefit of the doubt and learning how to work with them is essential to a strong mutual working relationship.

In turn, hopefully, you have a good manager that will be able to ask you good questions and take the time to understand your contributions. And if that is not your situation, then make sure you are sharing information with those around you; such as your peers, your boss, and other stakeholders.

Good leadership is keeping everyone on the same page, and if you want independence it is your responsibility to make sure people know what you are contributing.

I don't claim to come close to following my own advice in all situations, but I do keep reminding myself of what I believe is the right route to trust and autonomy. What is your route?

Thursday 23 July 2015

NOTE: SAS Grid Manager for Hadoop

I've recently written about how much new functionality is getting released by SAS on an almost monthly basis without much fanfare, and I've also written about how Hadoop is becoming a new "operating system" and we should expect to see Grid and LASR running within Hadoop in due course. Well, the release of SAS v9.4 m3 earlier this month brought: SAS Grid Manager for Hadoop.

In fact, the m3 release of SAS Grid Manager brought a raft of changes that point towards a different future for grid computing with SAS.
  • SAS Grid Manager for Hadoop has been added. SAS Grid Manager for Hadoop brings workload management, accelerated processing, and scheduling, to a Hadoop environment
  • Support has been added for using an Oozie scheduling server. This server is used in a SAS Grid Manager for Hadoop environment
  • An agent plug-in and a management module have been added to SAS Environment Manager. In short, we can now monitor and manage our Platform grids using Environment Manager instead of RTM (although some features remain unique to RTM for the moment)
So, grid computing in SAS 9.4 m3 now offers a choice between Platform Suite for SAS and Grid Manager for Hadoop. And if you choose the Platform grid, you may no longer need to install and operate RTM.

Licensing issues aside, you may choose to run one or both of the types of grid technology. This article focuses on Grid Manager for Hadoop. From a user's perspective, there is little or no difference between the two choices because Grid Manager for Hadoop accepts all of the existing Grid syntax and submission modes; integration with other SAS products and solutions is supported by Grid Manager for Hadoop. However, from an architectural and administrative point of view, I believe there are two key advantages for Grid Manager for Hadoop:
  1. If your data is in Hadoop, you don't need to extract it out of the Hadoop cluster in order to process it on the grid. A key tenet of big data is to minimise "data miles" by sending the code to the data rather than transferring terabytes or petabytes of data to the compute server
  2. SAS Platform grids require a clustered file system ("shared data"); Grid Manager for Hadoop uses a shared-nothing approach and hence a bane of my life is eliminated! I've never shared a happy coexistence with a clustered file system. They have often been new/unknown technology for my client's IT infrastructure team, and they have often been unreliable (there may be a link between these two facts). When the clustered file system is the heart of the grid, unreliability is not a good quality
I must point out that the documentation does not state that the full syntax of SAS/BASE and associated products is available when run on a Hadoop-based grid. Certainly, up to this point time, the SAS processes embedded into Hadoop have only been able to run a subset of SAS syntax, via DS2, plus high performance (HP) procedures. Furthermore, if we think of the no-shared-data model, it would seem inefficient in the extreme to run a SAS job on one grid node and expect the Hive/HDFS data to be streamed to that one node from all of the data nodes where it resides. So, efficient use of the in-Hadoop capability necessitates the use of DS2 or HP procedures.

The SAS Grid Computing in SAS 9.4, Fourth Edition manual gives you all the information you need to plan, install and utilise your Grid within Hadoop with your v9.4 m3 environment. You will see that Yarn is used for resource management, Oozie for scheduling. Cloudera, Hortonworks and MapR distributions of Hadoop are supported.

The manual tells us that the install process involves six steps:
  1. Install Hadoop services
  2. Enable Kerberos on the Hadoop cluster
  3. Enable SSL
  4. Update YARN parameters
  5. Set up HDFS directories
  6. Run the SAS Deployment Wizard to install and configure a SAS Grid Manager for Hadoop control server
I'm sure this install won't be plain-sailing because there are a lot of new technologies and components involved. Equally, there are doubtless some features of the Platform grid that are not (yet) available in the Hadoop-hosted grid. But if you are planning a big data project and you need a grid, I suggest you give due consideration to this new option.

Wednesday 22 July 2015

NOTE: HTML 5 is in VA Hub Already!

Aside from comments about my SAS Enterprise Guide vs SAS Studio article, Metacoda's Michelle Homes (@HomesAtMetacoda) was quick to write a comment about my Flash & SAS Visual Analytics (VA) article and to point out that HTML5 is already an option for the VA Hub. Michelle said:
HTML 5 has been available as a configurable option in the hub in SAS VA 7.1 which was released in October 2014. Some information on this can be found at

SAS VA 7.2 has a nice HTML 5 hub by default.
As a Session Recap from a SAS Live Q&A session states (along with nice comparison screenshots):
in Visual Analytics 7.1, the Home Page can be displayed using Flash or HTML5. Someone who has the Visual Analytics: Administration role can change the vah.client.ui.mode property in SAS Management Console. On the Plug-ins tab, navigate to Application Management --> Configuration Manager --> SAS Application Infrastructure --> Visual Analytics. Right-click the node and select Properties to access Advanced properties. The vah.client.ui.mode property specifies which mode of the Home Page to use. The default value, classic, specifies to use Flash to display the Home Page. The alternate value, modern, specifies to use HTML5 to display the Home Page. Note that the vah.client.ui.mode property is a site-wide setting that affects all users.
To learn more about SAS VA Mobile BI and HTML5, the TechTalks video of Himesh Patel (Sr Director, Research and Development) from this year's SAS Global Forum is a good place to start.

Tuesday 21 July 2015

NOTE: Your Response: EG & Studio

As Mark Twain is oft (incorrectly) quoted as saying: "Reports of my death are much exaggerated". I didn't say that Enterprise Guide (EG) was anywhere close to death when I (contentiously) wrote NOTE: What is SAS Studio? RIP Enterprise Guide? but I did suggest that there were good reasons to think that the web-based SAS Studio is in the ascendancy and that there might be a point in the future where it has sufficient features to make most sites seriously question whether it should be offered to users.

I got a numbers of comments in response (online and offline). Of the online comments, I was pretty certain when writing the original article that it would elicit a thoughtful and balanced response from Chris Hemedinger (@cjdinger), and I wasn't disappointed! Chris pointed-out that EG is still receiving new features (a sure sign of life in a software product). As Chris said, the rate of change has slowed (as it should for a mature product), but many SAS users still see it as an essential part of the toolbox. And Chris provided a link to a neat TechTalks video from this year's SAS Global Forum of Christie Corcoran (Development Manager for SAS Studio) talking about SAS Studio and its place alongside SAS Enterprise Guide. It's a nicely informative video, hosted informally by Chris. As Christie says in the video, SAS Studio is "another great way to get to your SAS".

Of the features in EG but not Studio, Eric Winslow highlighted Stored Processes. Eric pointed-out that Stored Processes are a great means for groups to share code. Within EG there is more than one means of simple and easily accessing Stored Processes (and editing and updating them). Whilst it could be said that PROC STP allows Stored Processes to be accessed from any SAS program (and hence Stored Processes are accessible from Studio), I imagine that Eric appreciates the interactivity available for executing Stored Processes in EG, plus the ability to manage the software development lifecycle in a more integrated and coherent fashion. Doubtless, explicit support for Stored Processes will come to Studio in time.

And, my old friend Phil Holland reminded me that he presented a paper on EG and Studio at this year's SAS Global Forum: "SAS Enterprise Guide or SAS Studio: Which is Best for You?". Oops, sorry Phil!  You can find Phil's excellent paper at the bottom of his extremely long list of papers presented at conferences on his web site. Phil's 23-page paper takes the reader through features, techniques and tips before offering some sound recommendations that are based upon the experience of the user in question.

Sunday 19 July 2015

VR - Now I get it!

I'm a regular viewer of the BBC Click technology programme on the BBC News channel (@BBCClick). It covers a broad variety of technology subjects in a very accessible manner - ideal Sunday morning viewing.

Until this weekend's Click I'd always associated virtual reality (VR) headsets with not-quite-there software along with games that involve shooting aliens. No more! If you're able to view Click on iPlayer, jump to 21 minutes 19 seconds in this week's episode and see Spencer Kelly introduce a car being driven by a driver who is wearing a VR headset. Awesome. I want one!

If you can't view iPlayer in your region of the world, take a look at Engadget's more detailed backgrounder and link to the associated YouTube videos a) the (somewhat over-produced) final advert for Castrol oil and a behind the scenes view.

If/when I get a VR headset I want one of those Mustang controllers with it!

Actually, if I'm honest, we have a couple of VR headsets in the house already. They're cheap and they're great fun. I'm talking about Google Cardboard. Upon delivery from Amazon, and after folding our £15 headsets into shape, we insert our Android phones, and we have a great VR headset. 

One of the first (free) apps I viewed on the headset was Paul McCartney. You don't have to be a fan of Sir Paul's music to get a buzz from standing on the edge of the stage whilst he and his band perform Live and Let Die, with fireworks aplenty. A great introduction to VR.

The Cardboard app itself provides a launcher for the various cardboard apps on your phone. YouTube has a growing number of 360-degree videos that work nicely with Cardboard. Have fun!

Wednesday 15 July 2015

NOTE: SAS v9.4 M3 is available

Further to my post on flavours of SAS v9.4 (indeed, flavours of SAS v9.4 m2), this week sees the release of SAS v9.4 m3 (otherwise known as 15w29). I've not had a chance to use it yet(!), but the documented features that caught my eye include:

  • The pre-production MSCHART procedure provides the ability to include "native" Excel charts in the Excel destination (see Chevell Parker's paper from this year's Global Forum)
  • Product upgrades: SAS Studio 3.4, SAS/STAT 14.1, SAS/ETS 14.1, Enterprise Miner 14.1, Data Integration Studio v4.901and others
  • A new product: SAS Factory Miner. Amusingly, this brand new product ships as v14.1! (to tie-in with the associated release of the SAS Analytics products)
  • Increased support for secure configurations of SAS
  • The upgrades to DI Studio include:
    • Three new transformations (Fork, Fork End, and Wait For Completion) to manage parallel execution of branches of nodes
    • The ability to embed a loop within a loop
    • Support for Hadoop With Query (HAWQ) with the addition of a source designer that provides an SQL interface to store data natively in the Hadoop Distributed File System (HDFS)
  • Environment Manager 2.5's features include:
    • Administration functions that were previously only available (interactively) in SAS Management Console: Managing metadata definitions for SAS users, servers, and libraries. User definitions can be viewed, created, and edited. Server and library definitions can be viewed, and SAS LASR libraries and servers and Base SAS libraries can be created and edited
    • Grid monitoring functions that were previously only available (interactively) in RTM: Collecting metric data from a SAS grid. Metric data is collected and reported upon for the grid and for individual grid nodes
    • SAS Backup Manager for scheduling, configuring, monitoring, and performing integrated backups. SAS Backup Manager can be accessed from the Administration tab of SAS Environment Manager
  • Also for those who install & administer SAS systems:
    • Changes that are expected to result in a 40% to 50% decrease in start-up time for SAS Web Application Server
    • Greater re-startability in the SAS Deployment Wizard (SDW). If the SDW is interrupted during an install and then restarted during the installation phase, it will install only those SAS products that it has not already installed
    • The SDW enables you to reduce the number of password prompts for required SAS internal accounts, metadata-based server accounts, and SAS Web Infrastructure Data Server accounts
    • Support has also been added for compressing and validating SAS Software Depots. In addition, the SAS Migration Utility has been enhanced to protect passwords in the migration package from being exposed
    • The installation and configuration of the SAS Embedded Process for Hadoop has been improved and simplified: for Cloudera and Hortonworks, Cloudera Manager and Ambari are used to install the SAS Embedded Process and the SAS Hadoop MapReduce JAR files.
For further details, start your journey in the M3 section of the What's New document.

Tuesday 14 July 2015

It's Not Too Late to Volunteer for the DataDive

I wrote in a post in June about the good that DataKind does by using teams of volunteers with data science knowledge. It's not too late to join the DataDive this coming weekend in London (July 17-19). If you'd like to contribute your time and knowledge, check-out the sign-up page on Eventbrite. Will I see you there?

What is a DataDive?
DataDives are weekend events that bring the data science community together with the non-profit community to tackle tough data problems in a short period of time. DataKind UK has selected specific charity projects to work on over the weekend. You need to bring your own hardware and software, your data skills, and the belief that you can help change the world for better!

Who are the social organisations bringing projects to the event?
DataKind UK will be working with My Help at Home, Ark, Centrepoint and The Key.

Tuesday 7 July 2015

Hadoop is the New Black

It feels like any SAS-related project in 2015 not using Hadoop is simply not ambitious enough. The key question seems to be "how big should our Hadoop cluster be" rather than "do we need a Hadoop cluster".

Of course, I'm exaggerating, not every project needs to use Hadoop, but there is an element of new thinking required when you consider what data sources are available to your next project and what value would they add to your end goal. Internal and external data sources are easier to acquire, and volume is less and less of an issue (or, stated another way, you can realistically aim to acquire large and larger data sources if they will add value to your enterprise).

Whilst SAS is busy moving clients from PC to web, there's a lot of work being done by SAS to move the capabilities of the SAS server inside of Hadoop. And that's to minimise "data miles" by moving the code to the data rather than vice-versa. It surely won't be long before we see SAS Grid and LASR running inside of Hadoop. It's almost like Hadoop has become a new operating system on which all of our server-side capabilities must be available.

We tend to think of Hadoop as being a central destination for data but it doesn't always start its presence in an organisation in that way. Hadoop may enter an organisation for a specific use case, but data attracts data, and so once in the door Hadoop tends to become a centre of gravity. This effect is caused in no small part by the appeal of big data being not just about the data size, but the agility it brings to an organisation.

SAS's Senior Director of the EMEA and AP Analytical Platform Centre of Excellence, Mark Torr (that's one heck of a title Mark!) recently wrote a well-founded article on the four levels of Hadoop adoption maturity based upon his experiences with many SAS customers. His experiences chime with my far more limited observations. Mark lists the four levels as:
  1. Monitoring - enterprises that don't yet see a use for Hadoop within their organisation, or are focused on other priorities
  2. Investigating - those at this level have no clear, focused use for Hadoop but they are open to the idea that it could bring value and hence they are experimenting to see where and how it can deliver benefit(s)
  3. Implementing - the first one or two Hadoop projects are the riskiest because there's little or no in-house experience, and maybe even some negative political undercurrents too. As Mark notes, the exit from Investigating into Implementing often marks the point where enterprises choose to move from the Apache distribution to a commercial distribution that offers more industrial-strength capabilities such as Hortonworks, Cloudera or MapR
  4. Established - At this level, Hadoop has become a strategic architectural tool for organisations and, given the relative immaturity of Hadoop, the organisations are working with their vendors to influence development towards full production-strength capabilities
Hadoop is (or will be) a journey for all of us. Many organisations are just starting to kick the tyres. Of those who are using Hadoop, most are in the early stages of this process in level 2, with a few front-runners living at level 3. Those organisations at leve 3 are typically big enough to face and invest in solutions to the challenges that the vendors haven’t yet stepped up to, such as managing provenance, data discovery and fine-grained security.

Does anybody live the dream fully yet? Arguably, yes, the internal infrastructures developed at Google and Facebook certainly provide their developers with the advantages and agility of the data lake dream. For most us, we must be content to continue our journey...

Thursday 2 July 2015

Summer of Coding

I'm always keen to encourage an awareness and uptake of coding in my kids. I think that coding brings a lot more than the simple ability to write programs. Coding requires a set of disciplines and an approach that are of great benefit in all walks of life.

As the summer holidays are upon us, with weeks upon weeks for kids to idle away their time, now is a good moment to revisit some of the online opportunities to give kids an insight into the joys of coding.

I've previously mentioned Scratch and App Inventor 2 (AI2) as two very accessible means for getting kids (and adults!) started, and producing a useful app that they can share with their friends very quickly. Both sites are free and use a clever building blocks interface to allow budding programmers to quickly understand the requirements of syntax. Scratch builds web-based apps and AI2 builds apps for Android devices (phones and tablets) with surprisingly powerful blocks for accessing web-based resources.

Scratch has always encouraged its users to share their work. Earlier this year App Inventor added its own gallery for showing and sharing.

Whilst it's not free, I've heard good things about Tynker. Tynker also takes the building blocks approach to syntax, and offers structured courses to help guide its students to exciting results.

Another means of getting your kids inspired is Lightbot. This is a series of programming-related puzzles featuring a cute robot character in a games app - available for Apple iOS, Android and other platforms. Great fun, and challenging too when you get to some of the higher levels.

As technology becomes more pervasive, traditional trades disappear, and the world of work becomes more globalised, the skills that newer members of the workforce need are changing: problem solving, team working, and communication are but three "21st century skills". Digital literacy (ability to find and use internet-based resources and information) and creativity— and the latter’s close relative, entrepreneurship—are close behind. And, the young have become more comfortable learning on their own, especially on topics of interest. They just need to be pointed in the right direction!

Tuesday 30 June 2015

More Flash in Chrome for Less Power ... and the HTML5 Migration

If you use one of SAS's web interfaces you'll be a great fan of the flexibility and usability of the user interface. And those capabilities are probably provided by Adobe Flash. Your browser is running the Flash plug-in.

But Flash has one or two downsides, principally its tendency to use lots of CPU cycles which in-turn uses lots of battery power. Not a problem maybe if you're hooked to the mains, but not good on a laptop of mobile phone/tablet.

If you use the Chrome browser you'll be pleased to hear that Google are improving Chrome's power consumption when Flash is running. When you’re on a webpage that runs Flash, Chrome will intelligently pause content, e.g. Flash animations, that aren't central to the webpage, while keeping central content (like a video) playing without interruption. If Chrome accidentally pause something you were interested in, you can just click it to resume playback. This update significantly reduces power consumption, allowing us to do analytics on-the-go for longer before having to hunt for a power outlet.

This feature was enabled by default on Chrome’s desktop Beta channel in June, and will be rolling out soon to everyone else on Chrome desktop.

Looking longer-term, SAS are replacing their use of Flash with HTML5. Whilst the use of Flash requires a plug-in from Adobe, HTML5 is supported by all modern browsers out-of-the-box, with no need for any plug-in. The majority of web sites and vendors are migrating to HTML5 due to its net neutrality and power-consumption benefits. SAS Studio already uses HTML5; Visual Analytics and Visual Statistics currently use Flash. We can expect a migration to HTML5, perhaps starting with the VA hub this summer, which will probably be complete next year.

Thursday 25 June 2015

The (Mostly True) History of Computing

I'm half way through a book that I simply must recommend to you before I even finish it. If you have a sense of humour or  if you have half an interest in the earliest evolution of computers, you will enjoy The Thrilling Adventures of Lovelace and Babbage by Sydney Padua as much as I am doing.

On the face of it, it is a graphic novel that accurately describes the Victorian 1830s relationship between the eccentric polymath Charles Babbage and his accomplice, Ada, Countess of Lovelace (the daughter of famed poet Lord Byron). When Ada translated her friend Babbage's plans for the "Difference Engine," her lengthy footnotes contained the first appearance of the general computing theory—one hundred years before an actual computer was built. Whilst Padua's cartoon telling of the story is thickly laced with humour, her copious footnotes provide an unexpected level of detail to the story.

So far so good, but the book really gets into its stride when it moves into a parallel universe. In the real world, Lovelace died of cancer soon after her publication, and Babbage never built any of his machines. In Padua's parallel universe, Lovelace survives, Babbage does build his Difference Engine, Lovelace smokes a pipe, and they both get into many madcap adventures where their analytical minds and the Difference Engine can save the day.

Yes, it is all a bit surreal! But I'm loving it, and I think it provides a good read for people of all ages. It may just even encourage one or two non-technologists to turn their imagination to computing.

Available from all good book stores including Amazon UK and Amazon USA. I bought my copy as a Kindle edition and I've found it very readable on my 7" Nexus tablet (be sure to read the pop-up instructions in Kindle which tell you how to scroll one panel at a time through the graphics).

Tuesday 23 June 2015

NOTE: What is SAS Studio? RIP Enterprise Guide?

In recent weeks I've mentioned SAS Studio in passing but it's a strategic product for SAS and it merits a proper description.

It's strategic because there is a clear trend for SAS to produce "thin" web-based user interfaces rather than "thick" applications that need to be installed on each and every user's PC. The broad adoption of HTML 5 in modern browsers means that SAS can provide rich functionality without being tied to proprietary technologies such as Shockwave and Flash that require plug-ins.

SAS Studio and Data Loader for Hadoop are HTML 5 web applications. Visual Analytics currently uses Flash but I fully expect that it will be fully HTML 5 within the 2016 timescale (I imagine that simpler parts of the VA interface, such as the hub, might even see HTML 5 later this year).

If I tell you that SAS Studio is a web-based interface that provides a SAS development & coding environment then you will probably be able to join the dots along with me and suppose that at some point in the future (after a few more iterations of enhancements for SAS Studio) SAS Studio will be SAS's preferred environment for general development of SAS code. But that's not to say that Enterprise Guide will no longer be available; SAS have a good track record of supporting older interfaces and functionally - don't forget that Enterprise Guide's predecessor (Display Manager) is still available and supported.

In fact if we compare the interfaces of Display Manager, Enterprise Guide and SAS Studio we can see a lot of similarity. The shots below illustrate this.

SAS continues to add valuable new features to Enterprise Guide (such as the versioning in v7.1), and SAS Studio doesn't yet have a visual coding interface, so EG ain't dead yet(!), but the convenience and ease of deployment for IT departments (which translates to lower cost) means that they will be encouraging SAS to continue to invest in SAS Studio to the point where users can be converted from EG to Studio.

So, if Studio is the future, what does it look like today?...

Well, we've established it's a web-based tool that doesn't need any software installed on  your PC. It has the SAS code editor at the forefront. You can go right ahead and start typing your code. Equally, it provides code-generation wizards for many of the tasks that we find in Enterprise Guide. And, of course, you can copy/paste the code generated by the wizards into your own stream of code.

In version 3.2, the list of tasks is categorised as Data, Econometrics, Graph, High Performance Statistics and Statistics. The list of 40 tasks seems small when compared with the 80+ tasks that I see in EG v7.1, but doubtless the list will continue to grow through each release of Studio. Basic data manipulation tasks that are not included in Studio include the Query Builder (a rich tool for building SQL queries in EG). And when you get into the detail with some of the functional elements you see further "enhancement opportunities", e.g. try filtering the table that you are browsing and you will be presented with an empty box in which you are expected to simply type a where clause.

One of the unique features of Studio is "snippets". These are (as the name suggests) small bits of sample code that you can easily add to your code editor. And you can easily add your code pieces to the "My Snippets" area for re-use.

The "Go Interactive" toolbar button toggles whether each execution of your code should be a continuation of previous executions or whether each execution should be independent. The former mode is useful for submitting one SQL query at a time whilst the PROC SQL remain active.

One shortcoming I've found (but you may see it as a positive feature) is that Studio doesn't give me the ability to open (and save) SAS code on my PC - it only allows open and close from the server. You may see this as a drawback or as a useful feature, depending on your point of view of centralised code, etc.

I've gotten some very snappy responses from my own experience with Studio: submitted code seems to get executed and the results returned far quicker than EG has ever managed. Clearly Studio has some way to go in terms of replicating some or all of EG's functionality, but it's certainly a very capable tool that will already meet the needs of some SAS users. If it takes a while to deploy a copy of EG to your new users, you should certainly think about giving them Studio as a stop-gap whilst they await the installation of EG. They might like it so much that they stay with it and don't use EG!

Further information can be found in the SAS Studio User's Guide. You can see SAS Studio in action in SAS's YouTube contribution named Working in SAS Studio. The best way to check-out Studio is to see it yourself: either download the University Edition, or sign-up for the AWS version. Try the future, today!

Wednesday 17 June 2015

NOTE: SAS Increasingly Embracing Virtuality & Cloud (and Hadoop) #SASGF15

Last week's UK SAS Forum encouraged me to dig-out my notes from this year's SAS Global Forum and type them up into some blog posts.

One of the notable trends at this year's SAS Global Forum was cloud and virtuality, both providing SAS sites with increased choice and flexibility.

The SAS University Edition has long been available free as a "vApp", i.e. an application packaged entirely into a virtual image that you can launch on your desktop (with the use of VMware Player or Oracle VirtualBox). This provides SAS Studio, Base SAS, SAS/STAT, SAS/IML and SAS/ACCESS (for PC file formats). The vApp approach means little or no installation effort, and the ability or un the same package on any operating system that supports VMware Player or Oracle VirtualBox, e.g. Windows and Mac OS.

At SGF, SAS announced the availability of University Edition via Amazon Web Services (AWS), meaning the power of SAS can now be accessed from any supported browser on any platform (including my ChromeBook!). SAS University Edition in AWS Marketplace is eligible for the AWS Free Tier program, which provides free access to first-time AWS subscribers for 12 months, up to 750 hours a month (subsequent AWS usage fees apply).

In vApp form, the package includes an HTTP sever, operating system, and SAS software, so it's a full 3-tier solution in a black box (meaning far fewer visits to the excellent SAS Community for Admin & Deployment!).

Another vApp, recently made available by SAS, is the Data Loader for Hadoop. Sadly, unlike the University Edition, this is free for a 90 day trial only, but if you are integrating your SAS platform with one or more Hadoop instances it's certainly worth a look. Data Loader for Hadoop provides self-service big data preparation, data quality and data integration for business analysts and data scientists. The point-and-click user interface enables users to prepare, integrate and cleanse big data faster and easier without writing code. Additionally, power users can run SAS code and data quality functions faster on Hadoop for improved productivity and reduced data movement. The most current release is SAS Data Loader for Hadoop 2.2.

If you're keen to earn more about Hadoop it's worth taking a look at the Hortonworks Sandbox. This too is a vApp (and it's free). If you have Data Loader for Hadoop alongside Hortonworks Sandbox then you'll find that Data Loader will configure itself against the cluster on first use, i.e. it will install the SAS embedded process within the cluster so that you can more efficiently execute your SAS work within the cluster.

There's plenty of information to help you get your SAS and Hortonworks technology working together. Look on the Hortonworks site (Explore the Possibilities of SAS Software and Hortonworks Sandbox) site and the SAS site (Step-by-Step Installation & Configuration of SAS Data Loader for Hadoop (Trial Edition) With Hortonworks Sandbox on Windows, and SAS Data Loader for Hadoop Video Tutorials).

Have fun!

Monday 15 June 2015

Using Data Science for the Greater Good #SASGF15

One of the keynote sessions at this year's SAS Global Forum (SGF) was by Jake Porway on the subject of using data for social good. Jake is CEO of DataKind - a charitable organisation established in 2011 to harness the power of data science for the benefit of humanity. Jake spoke eloquently and passionately about the good things that can be done with a little help from data science.

DataKind has grown to be an international organisation, helping communities across the globe. You can read a lot more about DataKind in the blog article that Jake wrote for SAS Voices back in April. In his article Jake explains how teams of DataKind volunteers have helped in ways which include:
  • Designed new poverty measures for the World Bank using untapped data sources such as light levels visible from space and food prices scraped from dozens of websites to turn what had been a five-year process into a real-time evaluation that takes the guesswork out of tracking inflation rates and easily identifies isolated food crises. 
  • Culled refrigeration temperature patterns and shipping routes with Nexleaf to identify the best transport methods for keeping vaccines from spoiling during delivery throughout the developing world. 
  • Created an algorithm that uses satellite imagery to map the poorest villages in Uganda and Kenya, enabling GiveDirectly to more accurately transfer international donations to the most impoverished households trying to pursue their dreams.
There are many ways to get involved with DataKind's very worthwhile activities. Take a look at the list of upcoming events.

For those of us in the UK, the summer DataDive over the weekend of July 17-19 is a good way to put some of our data science knowledge and experience to good use. Will I see you there? If you can't make the DataDive, check-out the UK page on the DataKind web site and keep an eye on the Meetup forum for other events across the UK.

Monday 8 June 2015

NOTE: 9.4, What's New - Becoming Newer Month-by-Month

If you're attending SAS Forum UK in London this week (or Manchester later this month), you might not be expecting to use it as a means to find out what's new in V9.4, after all V9.4 was released in summer 2013, so we can't really call it "new" anymore. There have been a couple of maintenance releases, but the most recent of those was M2 at the end of 2014. Well, you might be in for a surprise if you take a look at the What's New in SAS V9.4? manual. Many SAS customers don't know that SAS releases new features on an almost monthly basis... maintenance releases don't just bring fixes.

To illustrate what I'm talking about, jump to Appendix 2. Documentation Enhancements in the What's New book. You'll see the following list of "maintenance releases" and you'll then see what was new in each release.

May 2015 (SAS 9.4, Rev. 940_15w20)
April 2015 (SAS 9.4, Rev. 940_15w16)
March 2015 (SAS 9.4, Rev. 940_15w12)
February 2015 (SAS 9.4, Rev. 940_15w08)
January 2015 (SAS 9.4, Rev. 940_15w04)
November 2014 (SAS 9.4, Rev. 940_14w47)
October 2014 (SAS 9.4, Rev. 940_14w41)
September 2014 (SAS 9.4, Rev. 940_14w36)
August 2014 (SAS 9.4, Rev. 940_14w32)
June 2014 (SAS 9.4, Rev. 940_14w23)
May 2014 (SAS 9.4, Rev. 940_14w19)
April 2014 (SAS 9.4, Rev. 940_14w14)
March 2014 (SAS 9.4, Rev. 940_14w11)
December 2013 (SAS 9.4, Rev. 940_13w51)
November 2013 (SAS 9.4, Rev. 940_13w45)
October 2013 (SAS 9.4, Rev. 940_13w40)
September 2013 (SAS 9.4, Rev. 940_13w36)
July 2013 (SAS 9.4, Rev. 940_13w30)

There are far too many new features for me to list them all here, but below I have listed just a few highlights from the May 2015 release that might be of interest to the majority of readers.

  • Add-In for MS-Office 7.11
    • Increased integration with Visual Analytics
  • Visual Analytics 7.2
    • Ability to include parameters in URLs that link to individual reports
    • Functional integration with Visual Statistics (though Visual Statistics remains a separately-licensed product)
    • Import data from Google Analytivs, Facebook, and MapR
    • Able to add page numbers when printing to PDF
  • Enterprise Guide 7.11
    • Support for Visual Analytics 7.2
    • Export Visual Analytics reports to PDF
    • Use of a where expression for filtering in the data grid viewer (hurrah!)

Of course, you may not want to be upgrading your production system on a monthly basis, nor perhaps your development environment. But if a new release contains features that have real business value to for your site, be sure to check-out the Upgrading manual before ordering the software and double-clicking the setup module.

The latest version of SAS is V9.4 maintenance 2 (M2). M3 is expected to be available next month, and I've heard it will contain some significant new features plus noteworthy enhancements to the installation & upgrade process. So, upon its release, take a good look at what it offers - it won't just be bug fixes.

Monday 1 June 2015


If you haven't already seen a notice, there will be SAS Forum events in the UK again this year (in previous years they've been known as SAS Professionals Convention, but I guess the "forum" tag provides consistent branding with events such as SAS Global Forum).

The conference will be held in two locations; Warwick Business School, The Shard, London on 10-11 June and Salford University, Media City, Manchester on 24-25 June 2015.

Alongside a range of good speakers there will be opportunities to sit SAS certification exams too (albeit on non-conference days!).

More details, including registration, can be found on the event web site.