YARN vs Mesos
A comparison of YARN and Mesos as cluster resource managers, including their architectural differences, recent developments, and insights from the Google Omega paper.
A good discussion on this topic is available on Quora.
The key distinction: Mesos is a meta-framework scheduler, whereas YARN is an application scheduler.
Beyond that discussion, here is some additional information I’ve gathered that you might find useful. Note that the open source community moves fast, so some of this may be outdated by the time you read it.
Recent Developments
- With changes to the Capacity Scheduler, YARN can now schedule CPU as a resource. See YARN-2 for details.
- YARN now has support for cgroups in containers. Here is a related blog post on the topic.
- Storm on YARN can now be used directly.
- Starting with version 0.6, Spark on YARN is officially supported.
- A GSoC project to add security to Mesos is addressing security features that Mesos currently lacks. YARN already has Kerberos-based security. See the Mesos security wiki for more.
Research Papers
Google Omega: Paper (PDF) — Based on research done at AMPLab and Google for next-generation schedulers on parallel infrastructures.
Mesos: Paper (PDF)
YARN: Paper (PDF)
Scheduler Classification
The Omega paper classifies schedulers into the following types:
Monolithic schedulers use a single, centralized scheduling algorithm for all jobs.
Two-level schedulers have a single active resource manager that offers compute resources to multiple parallel, independent “scheduler frameworks”, as in Mesos and Hadoop-on-Demand (HPC).
The paper classifies YARN as a monolithic scheduler and Mesos as a two-level scheduler.
It is an interesting read and raises some questions about YARN. Quoting from the paper:
It might appear that YARN is a two-level scheduler, too. In YARN, resource requests from per-job application masters are sent to a single global scheduler in the resource master, which allocates resources on various machines, subject to application-specified constraints. But the application masters provide job-management services, not scheduling, so YARN is effectively a monolithic scheduler architecture.
At the time of writing, YARN only supports one resource type (fixed-sized memory chunks). Our experience suggests that it will eventually need a rich API to the resource master in order to cater for diverse application requirements, including multiple resource dimensions, constraints, and placement choices for failure-tolerance.
Although YARN application masters can request resources on particular machines, it is unclear how they acquire and maintain the state needed to make such placement decisions.
Google seems to be drifting away from YARN, unlike its counterpart Yahoo.
Hortonworks Perspective
From Office Hours: Q&A on YARN in Hadoop 2:
Architecturally, how does YARN compare with Mesos?
Conceptually YARN and Mesos address similar requirements. They enable organizations to pool and share horizontal compute resources across a multitude of workloads. YARN was architected specifically as an evolution of Hadoop 1.x. YARN thus tightly integrates with HDFS, MapReduce, and Hadoop security.