The open source Hadoop framework is ideal for distributed processing of large data sets across large numbers of severs.
What it isn’t good at is speed.
Actian Corp., which makes a number of specialized data management systems including the SQL-based Vectorwise analytic database has been watching its customers try to bridge the gap with Hadoop by building their own connections.
The latest version of Vectorwise will save them a lot of effort: Version 3.0 comes with advanced Hadoop integration, allowing customers to do fast queries of unstructured data at what Actian says is a relatively modest price.
Hadoop has lots going for it, says Fred Gallagher, general manager for Vectorwise (pictured). Hadoop’s HDFS file system provides almost unlimited storage, and Hadoop itself is good a parallel processing. But, he added, it’s cumbersome to do ad hoc queries or to drill-down data discovery because it’s a batch processor.
“So by integrating the large dataset capabilities of Hadoop with Vectorwise, people can get that responsiveness they’d like.”
Actian sells through the channel. Its channel partner program is geared toward solution providers who have business applications that are optimized and integrated with Ingres technology. Actian offers around the clock technical support to solution providers along with a mature toolset.
Other changes include a more efficient storage engine, support for more data types and analytical SQL functions and enhanced DDL (data description language) features.
Gallagher says that with the Hadoop Connector, Vectorwise on a Dell server with 12 cores can outperform a half-rack of data appliances on 90 per cent of queries at a cost of under $100,000 (including server).
“We’re able to move terabytes in an hour on a modest set of servers.”
Customers who use Vectorwise and Hadoop include a number of social media companies who have large amounts of subscriber data to process, he said. One has a Hadoop custer with over 250 TB of data and needs to analyze 20 TB of data at a time. Another stores Web logs and brings 100 billion records into Vectorwise for processing.
Vectorwise runs on Windows Server and Linux. Pricing varies starting at around $60,000.