The Data Mill

Hadoop 2.0's deep impact on big data and big data technologies

Businesses that ventured early into big data territory leaned on cloud computing for their Hadoop pilot projects, but that's changing, according to Merv Adrian, analyst with Stamford, Conn.-based Gartner Inc. These days, his clients are increasingly asking how to deploy Hadoop on-premises.

The Data Mill

Observations like this highlight how quickly big data and its related technologies are evolving. That's certainly the case, Adrian and fellow analyst Nick Heudecker argued, with the release of Hadoop 2.0, which became generally available in October. The updated version of the Apache Software Foundation's popular distributed computing framework made headlines last year predominately because of a new feature called YARN (Yet Another Resource Negotiator), which essentially breaks Hadoop out of batch processing and into the real-time world. The Gartner analysts said the more robust version will almost certainly lead to an uptick of Hadoop deployments and more use cases.

"As people gain experience, we expect them to build larger projects," Adrian said during a recent webinar he hosted with Heudecker. And not just larger projects, but completely new projects that can interact with each other in ways they've never been able to before. Hadoop 2.0 might even be hearty enough to move deeper into the organization and be integrated with the larger technology stack. And, while most businesses are using big data technology to tackle good, old-fashioned transactional data, Hadoop 2.0 can help businesses pursue unstructured and semi-structured data types, the experts said.

Promise aside, Hadoop 2.0 is not flawless. One significant weakness is security -- or the lack thereof. "It's important to note that these systems grew up largely in Web-centric, engineering-led companies that were primarily dealing with public data," Heudecker said. "As enterprises adopt the technology, this needs to be backfilled."

CIOs should expect security and big data to be a big topic in 2014, a space already targeted by vendors like Dataguise Inc., Gazzang Inc., Protegrity USA Inc. and Zettaset Inc., Adrian said. "So much of big data is about combining multiple data streams and getting a wider, broader view of customers, that people are becoming increasingly concerned about how to ensure the privacy of data," he said.

Hadoop distributors reflect changing market

During the webinar, Adrian covered a spectrum of Hadoop distributors, highlighting how the vendors are couching their big data message and providing insight into where the market is headed. Here's the breakdown:

  1. Amazon Web Services Inc. or cloud central: Its customers are doing work in cloud or are moving processing to the cloud. He said there was little talk from customers of a hybrid on-premises/cloud environment.
  2. Cloudera Inc. or the enterprise data hub: In this model, Hadoop moves to the architectural center of the data center with spokes connecting it to new and legacy systems. But don't expect to see this anytime soon. "Think of the offering as aspirational marketing," Adrian said. "It's the Hadoop community describing the vision of its future and the increasingly central role it expects to play."
  3. Hortonworks Inc. or the Apache pure play: The spinoff from Yahoo isn't adding other sources to its distribution, which means if and when your company decides to introduce something non-Apache to the stack, "you're going to do the integration yourself," Adrian said.
  4. International Business Machines Corp. or the integrated stack: Adrian reported IBM is "working aggressively" to connect Hadoop to the rest of its software stack.
  5. Intel Corp. or the relative newcomer: Look for them in 2014. The company is putting its core competencies to good use on everything from performance down to the chip. "That includes security," Adrian said.
  6. MapR Technologies or the mainstreamer: The company will focus on performance and enterprise-grade attributes. It will also continue to highlight its support of direct access to the network file system, which means moving data around less.
  7. Pivotal Software Inc. or the nonconformist: Adrian called Pivotal Software an "intriguing player with an intriguing point of view." The company is focused on how "sensor networks and in-memory, real-time operations are going to transform businesses," he said.

The not-so-distant future of work

Stowe Boyd, analyst and researcher for Gigaom Research, described today's workforce with three "Ds": distributed, discontinuous, decentralized. Today's professionals are more globally connected and more mobile than ever, which disrupts how work gets done -- e.g., the whopping 35% of creative professionals who now operate as freelancers.

Previously on The Data Mill

Big data and bust-out credit card fraud

International Institute for Analytics' predictions for 2014

Add semantic analysis to ward off big data/bad analytics syndrome

More recently, a fourth "D" has started to emerge -- disengaged. "There is a shadow cast by all of this turbulent and revolutionary change," he said. Part of the problem is a technological one, because while the workforce adjusts to this always-on mentality where every restaurant, coffee shop, train car and hotel room doubles as an office, the technology to support the behavior isn't keeping pace.

It's not that the technology to establish a mobile-friendly workplace doesn't exist, Boyd said, but that most businesses haven't made investments or researched and implemented new applications.

That needs to change, said Larry Hawes, an analyst at Ipswich, Mass.-based Dow Brook Advisory Services, who joined Boyd during a recent Gigaom Research webinar. "We need to see atomization of communications," he said.

Rather than invest in a single suite of communications applications (think Microsoft for mobile communications), applications like email and instant messenger should be broken up (atomized) and integrated directly into the application where the work is being done. And rather than stick with what businesses have been using for years, they should investigate new applications out there -- collaborative tools from Zendesk and Github are examples, Boyd said.

Sales and marketing departments are already clamoring for the stuff, but the big hiccup is the IT department, according to Robb Woods, head of sales engineering for the Mountain View, Calif.-based Blue Jeans Network, a video-conferencing service. Of course, introducing this kind of stuff to the enterprise is never easy, and Woods, for one, gets that IT must worry about how to manage the technology and how to ensure security.

But, he added, to remain relevant, IT organizations also have to take on "managing experience, managing productivity." And in the future workplace, that means mobile-friendly technology.

Welcome to The Data Mill, a weekly column devoted to all things data. Heard something newsy (or gossipy)? Email me or find me on Twitter at @TT_Nicole.

This was first published in January 2014