Google Cloud Dataflow Shows Competitive Advantage for Large-Scale Data Processing

Mammoth Data releases benchmark comparing leading cloud solutions

News provided by

May 03, 2016, 09:00 ET

DURHAM, N.C., May 3, 2016 /PRNewswire/ -- Mammoth Data, the leader in Big Data consulting, today announced the findings of its comprehensive cloud solution benchmark study, which compares Google Cloud Dataflow and Apache Spark.

Mammoth Data, a Big Data consulting firm specializing in Hadoop®, Apache Spark and other enterprise-ready architectural solutions for data-driven companies, saw a lack of understanding of current cloud technologies with no available comparison of the performance and implementation characteristics of each offering in a common scenario. As a result, Mammoth Data worked with Google to compare Google Cloud Dataflow with well-known alternatives and provide easily digestible metrics.

Google Cloud Dataflow is a fully managed service for large-scale data processing, providing a unified model for batch and streaming analysis. Google Cloud Dataflow provides on-demand resource allocation, full lifecycle resource management and auto-scaling of resources.

"Google Cloud Platform data processing and analytics services are aimed at removing the implementation complexity and operational burden found in traditional Big Data technologies. Mammoth Data found that Cloud Dataflow outperformed Apache Spark, underscoring our commitment to balance performance, simplicity and scalability for our customers," said Eric Schmidt, product manager for Google Cloud Dataflow.

In its benchmark, Mammoth Data identified five key advantages of using Google Cloud Dataflow:

Greater performance: Google Cloud Dataflow provides dynamic work rebalancing and intelligent auto-scaling, which enables increased performance with zero increased operational complexity.
Developer friendly: Google Cloud Dataflow features a developer-friendly API with a unified approach to batch and streaming analysis.
Operational simplicity: Google Cloud Dataflow holds distinct advantages with a job-centric and fully managed resource model.
Easy integration: Google Cloud Dataflow can easily be integrated with Google Platform and its different services.
Open-source: Google Cloud Dataflow's API was recently promoted to an Apache Software Foundation incubation project called Apache Beam.

"When Google asked us to compare Dataflow to other Big Data offerings, we knew this would be an exciting project," said Andrew C. Oliver, president and founder of Mammoth Data. "We were impressed by Dataflow's performance, and think it is a great fit for large-scale ETL or data analysis workloads. With the Dataflow API now part of the Apache Software Foundation as Apache Beam, we expect the technology to become a key component of the Big Data ecosystem."

Resources

About Mammoth Data, Inc
Mammoth Data is a Big Data consulting and analytics firm specializing in new data technologies like Hadoop and Apache Spark. Mammoth Data's consultants bring together data science and engineering expertise to help companies design, architect and implement modern data architectures. By turning structured and unstructured information into real business intelligence, Mammoth Data transforms companies into data-driven organizations.

Mammoth Data is headquartered in downtown Durham, North Carolina.

To learn more about Mammoth Data, visit mammothdata.com, follow us @mammothdataco, connect with us on LinkedIn, email [email protected] or call 919-321-0119.

SOURCE Mammoth Data