Migrating Large Hadoop Cluster
Lessons learned from migrating a large Hadoop cluster over a single weekend -- planning, data migration with distcp, code migration of 300+ Oozie jobs, and HBase migration.
“The text book launch” — any Indian would have heard this phrase, often used by ISRO after successful launch of satellites into space.
Last weekend we did a similar exercise: the textbook Hadoop cluster migration. This blog post shares thoughts and experiences around the same.
Just for storytelling, I am masking customer-specific details.
Epilogue
It will be two years this June (2014) when I landed in this totally new city for me, Sydney. I came to do an implementation of Hadoop for a large financial customer. This journey has been a great — in fact awesome — learning experience and opportunity to work with some of the best brains in the world.
The Weekend Task
We had to migrate to a new cluster as we were almost out of space and also needed more computation power. While writing this blog post I used the term “we” — it has been a combined effort of an awesome bunch of people.
- Old cluster: 11 nodes with about 250 TB space
- New cluster: 25 nodes
The two clusters were running different versions of Hadoop, so the move was from CDH 4.3 to CDH 4.6. Luckily there was binary compatibility between these two releases.
The cluster also runs HBase in production and has a replica mirror cluster (3rd cluster).
The Plan
This activity had lots of planning in the background:
- Testing of sample production jobs on the new cluster
- Total automation and setup of the new cluster via Cloudera Manager API
- Code migration of over 300 production Oozie jobs which pump data into the cluster and produce scores
- Trickle feed — resuming the jobs in batches
- Roster for shifts in which people will work to complete the activity within one weekend
Good clear communication on what is happening for all the people involved was very important, given the fact that our team was spread across Sydney and India.
Besides traditional email we also used WhatsApp for communication so that all people can be aware of things and can read messages when they are available (not to awake someone from sleep right after completing a roster shift).
The downtime of the cluster was started from Friday evening. Data copy via distcp was started a few days back during nights so that on the weekend we could use distcp in update mode to transfer the newly added data.
The Action
New Cluster Setup
Given the experience with managing the old cluster, it was a priority of the team that any new cluster bought would be setup via 100% automation. The new cluster configuration and setup is driven via Puppet and Cloudera Manager API. The non-Hadoop components are installed via normal Puppet packages. After the setup of the new cluster via automation, the machines were ready to be loaded.
Data Migration
During Friday night, distcp was started in update mode to complete the data copy process, moving everything onto the new cluster. An overnight stay in the office was planned, and distcp with World Cup soccer made a good combination. Since HBase is also part of production, we had to move it. To move HBase we used a similar distcp copy process — we had brought down everything (except HDFS and MR) from the source cluster. This approach is to be used if and only if HBase is down on both sides.
See the steps mentioned here on the Apache wiki: HBase Migration to New Cluster
The distcp was complete by 7:00 AM Saturday morning and full data migration was verified by taking a full folder tree dump on source and destination clusters and comparing the sizes.
The job of the overnight stay team was over and with the dawn of Saturday morning the job of the code migration team was about to begin.
Code Migration
The activity to sample-test the existing production jobs was carried out a few days in advance. This allowed us to find issues from binary incompatibilities to network issues for Sqoop jobs which needed firewall rules to be opened for talking to the new cluster.
Given the large number of production jobs, testing and being confident that everything would work 100% in the new cluster was a challenging task.
One of the main focuses was to capture what’s running at that moment in the cluster (thanks to our very strict Development Manager). We ran extensive checks to capture the current state of code in the system and moved back to the code repository.
From over 300 jobs we found 3 Oozie workflows which had definitions out of date in our code repo from what was running in production. With a large number of property files, Oozie workflow testing can be difficult.
I will write a new blog post about best practices in handling large numbers of Oozie workflows, especially around regression testing and structuring the Oozie code.
Oozie is an awesome tool — we never had issues and many of the jobs had been running silently for over 12 months.
We created a new branch after dumping the current state of production cluster code and started changing the old code base to configuration specifications of the new cluster.
By late Saturday afternoon the code migration was complete and ready to be run on the new cluster. We ran our first job on the new cluster with success and shared the news with everyone over WhatsApp.
WhatsApp group messaging worked very well, keeping all team members aware of the current happenings in base camp.
By late evening the code migration shift was over and a new team arrived to take over for resuming the production jobs.
Trickle Resuming the Production Jobs
The cluster is under SLA for downstream consumers. We started resuming the production jobs which had the highest priority. The team managing the platform for operations now had control. They started verifying the Oozie jobs after starting them incrementally on the new cluster. We have our own custom job tracking database, so writing one simple query into MySQL gave a clear view of which jobs were having problems and needed attention.
Hive and Metadata
We took the dump of the Hive MySQL metastore from the old cluster and created the corresponding database on the new cluster. Since there is a mirror cluster, we also configured MySQL replication for it.
Closing Thoughts
We did everything without any support call, ticket, or Apache mailing list email. This shows that our team is capable and experienced enough to deal with a wide range of things in the Hadoop ecosystem.
However there are a few lessons to learn from to make this truly the textbook Hadoop migration.
Source control discipline. There is always something which is not in the code base. Getting discipline in a large team that everything ends up in the code base is the most difficult thing. The last minute fixes often lead to changes running in production that never find their way into source control. Hence we missed a few minor things. Although the changes did not affect our planned timelines since we knew how to fix the issues (we had seen them before), we fixed them and the task of adding the fixes back to source control is the action point.
Automation. This is the key to managing large numbers of multiple clusters. The platform team did an awesome job in capturing the current state of the old cluster and tuning required properties in the new cluster. All deployment of the new cluster was done via Puppet and Cloudera Manager API. So any configuration changes were also driven by the same. The properties which the code migration team found were missing during actual resumption of production jobs were passed back into the loop to be added to Puppet.
Permissions. The jobs which failed running in the new cluster were due to permission issues. One of the action points we noted was how to capture this information back into our Puppet so that all future deployments take care of folders, owners, and permissions for us.
Oozie code structuring. Managing Oozie code given the large amount of XMLs and configurations can be difficult and this can be annoying at times. Understanding of concepts like Oozie globals, share lib, and bundles is very important when you are using it in production for large deployments. I will write an additional follow-up post for Oozie code.
Good bye for now — soon would be back with another post for the next task: moving on to YARN by end of next quarter.