Michael Berry is taking a stand against the big data hype. More data, said the analytics director for travel website TripAdvisor, doesn't always mean better business results. Case in point: big data and predictive analytics.
"Many predictive analytics applications turn out not to need all of the data," Berry said during his keynote talk at Predictive Analytics World. So the real task for data scientists et al. isn't figuring out how to analyze all the available data; instead, it's figuring out how much data it takes to see something worth noting. The bad news?
"There's not a simple answer to that question," Berry said.
However, testing the predictive model's performance by incrementally adding more data can shed light on when enough is enough. For example, when Berry wanted to know the standard bid by travel agency partners for a specific hotel and specific customer, he began computing averages: The first two bids compared to the first three bids compared to the first four bids and so on until he hit a steady plateau at 100,000. If he kept going to 200,000 bids, the average would change, sure, but not enough to matter.
"That's the way data tends to be: When you have enough of it, having more doesn't really make much difference," he said.
So if more data doesn't matter, what does? "So many things," Berry said. Working with clean data, doing unbiased sampling, hiring staff dedicated to data quality and creative thinking.
That's right, there's a big place in predictive analyses for those soft data science skills, such as figuring out what variables can make the model stronger or what new patterns might be discovered by combining different kinds of data together. Examples?
"Someone had to think of the idea of wind chill factor," Berry said, before combining actual temperature and wind speed to reveal a new data point: What the weather will actually feel like.
More big data delusions
Berry wasn't the only presenter who badmouthed the state of big data and predictive analytics. Karl Rexer, founder of the consulting firm Rexer Analytics, went so far as to suggest that the current crop of data scientists suffers from a bit of delusional thinking.
In his 2013 Data Miner Survey, respondents indicated that the size of data sets is getting bigger. But when Rexer asked them how many records are in a typical data set they use for analyses, "We get the same answer we got in 2007," he said.
That's not to say big data is a farce or to give short shrift to the interesting work some are doing in this space, he said. "But for the typical analytic predictive modeling/data mining/whatever-you-want-to-call-it project, I would say the overall sample size used for those data mining projects is not increasing."
Name that acronym
Translating the language of analytics into something the business can understand is challenging. One way Paychex, a payroll, human resources and benefits service provider, deals with the language barrier is by, well, using language the business suggests.
The Data Mill
'Cookie stuffing': A data scientist tackles sleazy side of online ads
Do businesses have the patience for good data science?
How semi-structured data drives LinkedIn analytics
"When we build a model, we'll run a naming contest for the users," Tom Kern, a risk-modeling analyst for Paychex, said at Predictive Analytics World. Kern's department will send users an email with a short description about the model and suggest a couple of words to get them started. The users have to come up with "an acronymic name," he said. So there's SAM, the sales anticipation model, and TIM, the territory identification and mapping model. "Still working on a TOM," he quipped.
If the business users' suggestion is chosen, they get a gift card, and the company gets its users, such as the sales staff, to think about what the predictive model really does.
The Tide turns
The Procter & Gamble Co., one of the biggest consumer goods retailers in the world, announced plans to release a lower-priced version of the laundry detergent Tide in an effort to attract mid-tier customers. Bold move or bad decision?
"One of the big concerns? If you launch products like these, not only are you going to attract folks you currently don't have, but you're going to encourage consumers to trade down," said Shel Smith, partner and founder of marketing analytics firm Twenty-Ten Inc.
That's especially true after a recession that forced so many consumers to be price-conscious. But Smith, for one, has faith in P&G's strategy. He believes the company will use predictive modeling, lots of data and highly targeted marketing to make new customers but keep the old.
"There's something they know that we don't in terms of their ability to maintain the existing franchise, but go after new consumers by being far more surgical," he said.