Tuesday, February 16, 2016

Hadoop Multi Node Cluster On Linux Containers - Backed By ZFS RAID File System - Inside Azure VM

We’re successful in building a Multi-Node-Hadoop-Cluster with only LXD-Linux Containers.

i.e All Name/Data Nodes have been built inside four LXD containers (One Name Node, 3 Data Nodes) hosted inside a single Azure Ubuntu VM. Here goes the major points about the implementation.

For unparalleled scalability, I've chosen a standard Azure Ubuntu VM (8-Core CPU, 14-GB RAM, 4x250GB Data Disks) with ZFS-RAID file system. i.e To increase the hardware capability, I can scale up my Azure VM Hardware any time. To scale up the storage, I can dynamically add Azure Storage Disks on demand, and then add them to the ZFS Pool, which will dynamically provide the additional storage to all running containers. Cool hah? :)

Also with ZFS file system you can clone, snapshot your file system at any moment. You can also Hot-Add additional storage any time, which will be immediately visible to the OS as it is running. ZFS also provides RAID-0 (I've chosen) and RAID-Z provisioning.

1. Provisioned the Ubuntu VM inside Azure. Installed LXDE, RDP packages to get the remote desktop on my Local Machine

2. Installed ZFS Tools and created a new ZFS Pool containing the 4x250GB disks (1TB)

4. Mounted ZFSPool so that, LXD directories will reside inside the ZFS

5. Installed LXD, and updated networking settings (Changed the LAN subnet IPs)

6. Pulled Ubuntu-Trusty-AMD64 image from LXD repository

7. Created the first LXD container (HDNameNode-1) and configured
Hadoop Single Node Cluster In it

8. Snap-Shoted ZFS File System, to keep my Hadoop-Single-Node-Cluster for later retrieval if required

9. Now I've updated HDNameNode-1 container, to have multi-node configurations (Which all nodes should have in a cluster).
(I've used tutorials 1
& 2). Tutorial-2 is for Hadoop-1.0+, So I'had to rely on Tutorial-1 to get settings for Hadoop-2.0+ multi-node configuration using YARN.

10. Cloned HDNameNode-1 container using LXD utility, to have 3-more containers which will act as Data Nodes
(HDDataNode-1, HDDataNode-2, HDDataNode-3)

11. Updated IP and Network settings for Name/Data nodes as desired

12. Updated HDNameNode-1 container, to have name-node specific multi-node configurations

13. Updated all DataNode containers, to have Data-node specific multi-node configurations

14. Restarted all containers, and started hadoop on NameNode, which in turn powered up Data Nodes.

15. Now I'm running a 4-Node-Hadoop-Cluster.

16. I can clone any Data Node any time, to have more data nodes if desired in future

17. Snap-shotted my ZFS pool for disaster recovery.

A few commands Listed below for your reference:

---Add ZFS package

apt-get update
apt-get install ubuntu-zfs

---Create ZFS Pool and Mount for LXD directories
sudo zpool create -f ZFS-LXD-Pool sdc sdd sde sdf -m none
sudo zfs create -p -o mountpoint=/var/lib/lxd    ZFS-LXD-Pool/LXD/var/lib
sudo zfs create -p -o mountpoint=/var/log/lxd    ZFS-LXD-Pool/LXD/var/log
sudo zfs create -p -o mountpoint=/usr/lib/lxd    ZFS-LXD-Pool/LXD/usr/lib
---Install LXD and update network settings for containers
add-apt-repository ppa:ubuntu-lxc/lxd-stable
apt-get update
sudo apt-get install lxd
#To update the subnet to and DHCP Leases
sudo vi /etc/default/lxc-net
sudo service lxc-net restart
--Add Ubuntu Trusty AMD64 Container Image and create first container, to setup SingleNodeCluster
lxc remote add images images.linuxcontainers.org
lxc image copy images:/ubuntu/trusty/amd64 local: --alias=Trusty64
lxc launch Trusty64 HDNameNode-1
lxc stop HDNameNode-1
zfs snapshot  ZFS-LXD-Pool/LXD/var/lib@Lxd-Base-Install-With-Trusty64
zfs list -t snapshot

***Configure Hadoop Single Node Cluster On first container and Snapshot it

**change ip to

lxc start HDNameNode-1
lxc exec HDNameNode-1 /bin/bash
lxc stop HDNameNode-1
zfs snapshot  ZFS-LXD-Pool/LXD/var/lib@Hadoop-Single-Node-Cluster
zfs list -t snapshot

***Update Name Node configuration to have Multi-Node-Cluster-Settings

-----Clone Name Node to 3 Data Nodes

lxc copy HDNameNode-1 HDDataNode-1
lxc copy HDNameNode-1 HDDataNode-2
lxc copy HDNameNode-1 HDDataNode-3
#Change IPs to,4 and 5

***Update all Data Node configurations to have Multi-Node-Cluster-Settings

---Snapshot the Multi Node Cluster



  1. Hi Abraham. I am trying to implement a hadoop cluster on LXD containers using your blog. Just needed to know how did you set up static IP for containers so that they can be accessible over the lan.

    1. I've used static IP for each container, by modifying /etc/network/interfaces and /etc/rc.local.

      Then i'd used a software bridge (br0) at the host machine, as a parent network device for all containers, so that all container's virtual network interfaces will be attached to this host bridge. Finally i've bridged the physical ethernet interfaces in the host to this bridge, so that all packets arriving at the bridge will be routed through the entire LAN.

      you can set the network parent device, by modifying the default LXD profile. Let me know if you need any clarifications.

  2. Hi Abraham,

    We're working on automating the ZFS/LXD/big data parts in Ubuntu with Juju. Have you checked it out? We've got steps 4-14 pretty much all automated now and would like to see what you think of our solution. Ping me if you're interested in working together!

    1. Great to know! I'm eager to learn about your implementations, though I'm new to Juju..

  3. Abraham,

    Per Jorge's comment check out:

    Be great to hear what you think, if it helps you get to the science faster. We hang out in #juju@freenode if you have any questions.