Infrastructure

Open Data Platform

Personal views on the Open Data Platform (ODP) announcement and what it means for Hadoop users, BI vendors, and the major distribution companies.

19 February 2015 · 5 min read

My personal views on the recent Open Data Platform announcement.

Disclaimer: I present these views as an industry user of the Hadoop platform and its related ecosystem tools. I have no association with any core Hadoop distribution company. My bread and butter is Hadoop, so it excites me.

What ODP Means for Us

When operating in organisations with multi-flavour Hadoop deployments, we frequently face the question: “Will this tool work with this vendor’s distribution?” Do we need to test performance or compatibility for everything? With ODP, we can be confident that if a product is compatible with version X of ODP, then all flavours will support it.

The same applies to BI tool vendors — their headache of continuously proving compatibility with every Hadoop flavour would be reduced.

So it is a win-win for BI tool vendors and implementors.

Now, some thoughts on the vendors.

Cloudera

I read Cloudera’s post [1] explaining their decision not to join ODP. At that point, Cloudera was far ahead of the traditional Hadoop flavours (leaving MapR aside for now). The Cloudera team invested heavily to produce excellent products that put them well ahead of the competition — and the best product is not any single code artefact (read on). The news of $100M in revenue also proves the point. Through sheer hard work, Cloudera knows the position they hold in the market, and you can read their post on ODP as “we don’t need anyone.” They were already doing well sponsoring the Apache Foundation.

Hortonworks

These are the hardworking people in class who code all the time and happily join any initiative that says “open source.” Collaboration with the community and open source is in Hortonworks’ DNA, and that is what makes them stand apart from other vendors. They missed the original wave of the Hadoop market (money) due to a lenient attitude toward documentation when the ecosystem was getting started. I think this is what gave Cloudera the early lead — when everyone was starting out, nobody knew how to use Hadoop, and Cloudera documented it properly. They gradually built customer trust, and customers told them what to build and sell.

What is in ODP for Hortonworks? I cannot see much added value — they are already excellent in the HDP space. If you just remove the “H” in HDP and replace it with “O,” that is essentially what this initiative amounts to. Hortonworks, I am a big fan.

Pivotal

I need to think carefully before commenting on what this company was trying to do. At the time, ODP seemed like a desperate attempt by Pivotal to create a footprint. Their closed-source distribution strategy did not work, perhaps because Pivotal was a mixed breed of open-source and closed-source founders. They were unable to decide what they really wanted to do. I was surprised to read Cloudera’s comments about Pivotal’s open-source contributions — Pivotal did solid work on Cloud Foundry, Redis, and RabbitMQ (that comment seemed driven by pride).

What will ODP do for Pivotal? I felt they should have dropped their own distribution and adopted the HDP flavour. Their selling point should be support and expertise in implementing HAWQ, Greenplum, and GemFire solutions, although they still need to prove these tools can compete in the current market. I was also surprised to see them drop Kafka and Storm from their announced distribution — definitely a questionable decision.

One bright spot for Pivotal is the Cloud Foundry stack. Combining big data applications with a presentation layer could bring real expertise to the table. A special hint for them: IoT is the key. Do your homework properly — ODP will not help if you plan poorly.

My suggestion: stick to what you are excellent at and bring that to customers. Let the code do the talking. After Apache incubation, start discussing the code you have open-sourced on the Bigtop mailing lists. Ask the community what should be moved and packaged into Bigtop. Learn from Hortonworks.

Closing Thoughts

I wonder why the ODP initiative was not proposed and discussed on the Bigtop mailing list, which seems like the natural umbrella for all Hadoop integration work. Even as a sub-project focused on checking compatibility of third-party tools under a given Bigtop release, it would have made sense. No individual BI or Hadoop ecosystem vendor can manage the whole integration stack, and that is the reason ODP came about. There are licensing issues that guide Apache on integrating selected components into Bigtop products, which is one factor behind the spawning of ODP. Could Bigtop have been a better venue for this conversation?

At the time, very little had been released about ODP’s plans beyond what was on [2] and the official announcement. We had to wait to see what would come next.

I feel every day a new challenge in this space, and I can easily sense the difficulties facing companies whose entire business is building these systems. Hadoop has changed the lives of many (including mine), and I am sure it will remain an exciting space.

ODP will bring some form of uniform platform in the future, so let us be open and welcoming to the effort. Even if it fails, it does not matter — they are trying to create a level playing field for everyone.

References

  1. Cloudera’s post on ODP
  2. Open Data Platform