VoltDB Announces Enterprise-Grade Hadoop Integration

Real-time Streaming Export Leverages Cloudera's Distribution Including Apache Hadoop for High Performance Integration

Jun 22, 2011, 09:00 ET from VoltDB

BILLERICA, Mass., June 22, 2011 /PRNewswire/ -- VoltDB, a leading provider of high-velocity data management systems, today announced the release of VoltDB Integration for Hadoop.  The new product functionality, available in VoltDB Enterprise Edition, allows organizations to selectively stream high velocity data from a VoltDB cluster into Hadoop's native HDFS file system by leveraging Cloudera's Distribution Including Apache Hadoop (CDH), which has SQL-to-Hadoop integration technology, Apache Sqoop, built in.

(Logo: http://photos.prnewswire.com/prnh/20110622/NE23504LOGO )

"The term 'big data' is being applied to a diverse set of data storage and processing problems related to the growing volume, variety and velocity of data and the desire of organizations to store and process data sets in their totality," said Matt Aslett, senior analyst, enterprise software, The 451 Group. "Choosing the right tool for the job is crucial: high velocity data requires an engine that offers fast throughput and real-time visibility; high volume data requires a platform that can expose insights in massive data sets. Integration between VoltDB and CDH will help organizations to combine two special purpose engines to solve increasingly complex data management problems."

Volume, Velocity and Variety

The volume, velocity and variety of data are exploding, fueled by social applications, sensor automation, mobile networking, and other data intensive forces.  Organizations are increasingly turning to specialized, task-specific data management solutions.  Leading examples include VoltDB, which is designed to process high velocity data in real time, and Cloudera's Distribution Including Apache Hadoop (CDH), which provides organizations with a reliable and elastic infrastructure for data processing and deep analytics.  VoltDB's Integration for Hadoop allows customers to rapidly move high velocity data from VoltDB to CDH for long term storage and analysis.

"Customers across a wide variety of industries, from retail and web services to government and telecommunications, are using Cloudera's Distribution Including Apache Hadoop to identify new value from a wide variety of data sources and then process that data into new product features for their end users," said Ed Albanese, Head of Business Development for Cloudera. "It's exciting that companies using CDH are now able to collect data from VoltDB – a next-generation, real-time database, process that data into high value insights and then deliver the results back to VoltDB for real-time consumption. This integration introduces new opportunities for processing and delivering information derived from a previously untapped class of data."

Commercial-grade Integration

VoltDB Integration for Hadoop is designed specifically to handle the widest variety of customer deployment scenarios including end-user applications, site-based OEM installations and Cloud-based deployments.  It combines VoltDB's enterprise-grade export environment with Apache Sqoop, a Cloudera-sponsored solution for integrating relational databases with Hadoop infrastructures, and delivers the following capabilities:

  • Simple, fast set-up.  Establishing integration between VoltDB and a Hadoop installation is fast and easy.  A user identifies which VoltDB data will be exported to Hadoop, configures the VoltDB export client with the location of Hadoop, the location of a VoltDB cluster, Sqoop options such as output formatting, and other installation-specific instructions (e.g., frequency of import).  The VoltDB export client automatically manages periodic Sqoop jobs based on this configuration.  The entire set-up process can be completed in about 15 minutes.
  • Loosely-coupled, push-pull operation.  VoltDB automatically pushes copies of export data, in real-time, to the VoltDB export client, which in turn automatically queues that data.  The Sqoop receiver then pulls data from the VoltDB export client and imports that data into HDFS on whatever frequency and in whatever amounts the user has defined.  VoltDB's export client manages its data buffer in a way that eliminates possible "impedance mismatches" (i.e., VoltDB exporting data faster than Sqoop imports that data).
  • Automatic overflow management.  VoltDB's export client also automatically writes overflow data to disk to optimize memory utilization.  This feature protects against large-scale overflows that could occur if the Sqoop receiver terminates, and allows export data to be retained across sessions if the VoltDB database is stopped.

"Big Data applications come with a complex combination of operational and analytical challenges," said VoltDB CEO Scott Jarr.  "In response, many organizations are evolving rapidly toward specialized database engines that must function in a co-ordinated way.  Recognizing this need, VoltDB and Cloudera are working co-operatively to deliver high-powered product integrations that are easy to use, fast to deploy, and reliable to operate in production."

About VoltDB

Designed by DBMS pioneer Mike Stonebraker for organizations that have reached the price/performance limitations of general purpose SQL databases, VoltDB combines the proven power of relational processing with blazing speed, linear scalability and uncompromising fault tolerance.  VoltDB is the perfect database solution for high velocity database applications that require 100% accuracy and real-time analytics.

For press inquiries, please contact:
Fred Holahan
VoltDB, Inc.