Wednesday 2 September 2015

NOTE: SAS "Inside" of Hadoop

We previously looked at SAS Grid Manager for Hadoop, which brings workload management, accelerated processing, and scheduling to a Hadoop environment. This was introduced with the m3 maintenance release of SAS v9.4. M3 also introduced support for using an Oozie scheduling server.

If you're keen to get additional SAS services running on your Hadoop cluster, potentially reducing "data miles", you'll be pleased to know that SAS has an experimental feature in v2.7 of the LASR Analytic Server that allows us to experimentally manage resources with YARN. I need to stress "experimental" - this is not ready for our production systems quite yet, unless reliability & availability are not our top priorities.

If the experimental status doesn't put you off then you can find more details at the back of the LASR Analytic Server 2.7: Reference Guide.

YARN (Yet Another Resource Negotiator) is part of the base framework of version 2 of Apache Hadoop. It's a resource manager and it takes care of the Hadoop cluster's compute resources. YARN can manage and share resources between various applications. Configuring LASR to participate in YARN's resource sharing allows YARN to have a complete picture of activities on the cluster.

The use of YARN with LASR is part of the increasing integration between SAS and Hadoop. I look forward to seeing it move to "general availability" status.