The Greenplum division of EMC Corp. is building a single data analytics platform that can crunch both structured and unstructured data and give a broad range of users the tools to study an enterprise’s information.
Greenplum, which EMC acquired last year, plans to introduce its Unified Analytics Platform in the first quarter of next year. UAP will combine the EMC Greenplum database with EMC Greenplum HD, which uses the Hadoop open-source analysis framework for unstructured data, and EMC Greenplum Chorus 2.0. Chorus is the user interface for setting up queries and creating visualizations, and in the new version it lets users address both structured and unstructured data.
EMC is announcing the Greenplum UAP on Wednesday at an event in Mountain View, California. Pricing will be disclosed next year.
Organizations in many areas have mountains of data from their operations that are becoming too big to analyze with conventional tools, according to Enterprise Strategy Group analyst Julie Lockner. The volume of data, the complexity of a query and the need for quick answers often creates a challenge, she said.
Some enterprises, especially in retail and the health sciences, are adopting new technologies like those coming from Greenplum to learn more from the data they already have, she said. For example, online stores can correlate visitor behavior with eventual purchases and pharmaceutical companies can more easily process results of clinical studies. Insurance, investment and other companies also are starting to embrace new analytics tools to make more accurate predictions.
One of Greenplum’s goals has been to make data analytics tools available to business executives and other employees, rather than just a team of dedicated data scientists. Chorus provides a less arcane interface for translating human questions into queries against sets of data, and it includes a social networking environment where people across an organization can collaborate on working with the data.
The UAP brings enterprises two main benefits, said Michael Maxey, senior director of product marketing at Greenplum.
“One is, the scope of data they can address, but also being able to address all of the existing processes and expertise in an organization and extend it over those new data sets,” he said.
In addition to gaining access to unstructured data through Greenplum HD, Chorus 2.0 features an enhanced ability to quickly create a virtual “sandbox” in which to develop new analytics processes, Maxey said. That addition draws on technology from EMC’s VMware subsidiary, he said. vCustomers can deploy the UAP on their own standard computing hardware or order a prepackaged configuration, Maxey said. Enterprises that already have the Greenplum database or Greenplum HD can integrate those into the unified platform.
Gleaning insights from structured data in traditional databases requires different technology from analyzing unstructured data, such as Web pages, images and video. If business managers want answers to questions that require both kinds of information, typically they need two analytics platforms, and the enterprise may only be able to afford one, Lockner said. Greenplum’s UAP should be a more economical solution that lets a company answer all types of queries, she said.
An IT department that can produce such reports internally preserves its own role in the company and can help keep enterprise data inside the firewall, which can be important for compliance, Lockner said. The UAP should help to make that possible when business management wants a new type of report.
“If you don’t have to scramble around with multiple vendors trying to figure out what to do or who to go to, it prevents the business from going outside of IT and getting answers from a cloud vendor,” Lockner said. “IT is really competing with these cloud vendors, or software as a service.”
Data visualization startups such as Tableau and Alpine Miner already offer more accessible data analytics interfaces like what is available with Chorus 2.0, but the overall capabilities offered in UAP are fairly new, according to Lockner. In fact, the breadth and speed of such emerging tools change the game for data analytics to the point that data scientists and others need to relearn how to study an organization’s information, she said.
“There aren’t a lot of people who know how to leverage these platforms, or how to look at this data analytics problem, other than what they’ve learned in college,” Lockner said. Being able to analyze all the data in an organization or do data modeling in one day instead of months changes the picture, she said.
To that end, this week Greenplum also announced its Big Data & Analytics Training Program, which it said will be taught at more than 700 colleges and universities.