Precision is what most people strive for in using analytics. Crunching all those numbers with computing power that could run a small province has to generate an exact measure, right?
No, says an expert, who believes it pays to be vague — particularly in big data projects.
“I’ve seen many end results in files and they have very impressive numbers,” says Alan Khara, director of information of the First Nations Education Steering Committee (FNESC) of Vancouver. But “the end result is the numbers were never achieved or were way off.”
It’s more accurate in a report to give a range of percentages in a prediction, he told a Toronto big data conference Wednesday. Ironically, in many cases the narrower the results the greater the margin of error.
They believe that huge datasets generate precise results is one of the five myths about big data that contributes to the failure of projects, he said.
Khara, who has worked in analytics for a university, financial institutions and a British Columbia health institution, led him to the myths. The others are:
–The more data you have, the more you’ll get out of it
Believing this is why organizations spend so much time sweeping in as much data as they can get their hands on, said Khara. But volume doesn’t guarantee information. He recalled working for an institution that collected 8 billion images of space (each about 10 MB) looking for a planet that was allegedly pulling Neptune further from the sun than its regular orbit. With the technology available at the time the research team thought it would take three years for their data model to search the images.
The hidden gravitational force wasn’t found. But other teams using the similar data found some small but useful particles. What did they do different? They invested more time in the data model than collecting data.
“From a business point of view It is important that you’re collecting big data, it is how you’re analyzing it,” he told the conference. The more diverse skills your team has the better, he added.
Investing time in a model doesn’t mean having the latest tools, he cautioned.
– Structured data is better than unstructured
Not true in many cases. One problem is organizations often convert unstructured data (like metadata) into a structured form. But, said Khara, what that does is alter important data from the unstructured file. If that file included time-sensitive or -related data that can be fatal if it isn’t considered.
He recalled a B.C. construction company complaining that after investing in big data tools to analyze its decades of data on physical infrastructure there was no useful results. But, Khara points out, data models can’t be static — data collected into a system years ago under a set of assumptions may not fit assumptions of today, so data models today may not be able to analyze old data. “Everything changes with time.”
This is great in providing directions to the big data initiatives. Thanks for sharing.