Databricks Donates Declarative Pipelines to Apache Spark™ Open Source Project

News provided by

11 Jun, 2025, 18:30 IST

SAN FRANCISCO, June 11, 2025 /PRNewswire/ -- Data + AI Summit -- Databricks, the Data and AI company, today announced it is open-sourcing the company's core declarative ETL framework as Apache Spark™ Declarative Pipelines. This initiative comes on the heels of Apache Spark reaching two billion downloads and the recent launch of Apache Spark 4.0. These releases build on Databricks' long-standing commitment to open ecosystems, ensuring users have the flexibility and control they need without vendor lock-in. Spark Declarative Pipelines tackles one of the biggest challenges in data engineering, making it easy to build and operate reliable, scalable data pipelines end-to-end.

Spark Declarative Pipelines provides an easier way to define and execute data pipelines for both batch and streaming ETL workloads across any Apache Spark-supported data source, including cloud storage, message buses, change data feeds and external systems. This battle-tested declarative framework for building data pipelines helps engineers address common pain points like complex pipeline authoring, manual operations overhead and siloed batch/streaming.

Spark Declarative Pipelines is based on Databricks' core declarative ETL framework, which is used by thousands of customers. With the proven ability to handle complex data engineering workloads and low-latency streaming, Spark Declarative Pipelines lays the foundation for the next generation of data processing and governance. With Spark Declarative Pipelines, more community members can begin to cut engineering time and costs and reliably support new AI agent systems and other workloads in production.

"Our commitment to open source is unwavering. With origins in academia and the open source community, Databricks was founded in 2013 by the original creators of the lakehouse architecture and open source projects including Apache Spark, Delta Lake, MLflow and Unity Catalog," said Matei Zaharia, Co-founder and CTO of Databricks. "We worked closely with the community to help remove friction around data formats that kept information siloed. Spark Declarative Pipelines now gives enterprises an open way to build high-quality pipelines."

Key benefits of Spark Declarative Pipelines include:

Simplifying pipeline authoring: Data engineers and analysts can quickly declare robust pipelines with minimal coding, focusing on delivering business-critical insights.
Improved operability by design: Spark Declarative Pipelines help catch issues earlier in development through clear pipeline definitions that are validated in full prior to execution, reducing the risk of failures downstream and making pipelines easier to troubleshoot and maintain.
Unified batch and streaming: Data teams can flexibly meet both real-time and periodic processing needs through a single API for defining and managing batch and streaming data pipelines, simplifying development and maintenance.

"Declarative pipelines hide the complexity of modern data engineering under a simple, intuitive programming model. As an engineering manager, I love the fact that my engineers can focus on what matters most to the business. It's exciting to see this level of innovation now being open-sourced, making it accessible to even more teams." — Jian (Miracle) Zhou, Senior Engineering Manager, Navy Federal Credit Union

"At 84.51˚ we're always looking for ways to make our data pipelines easier to build and maintain, especially as we move toward more open and flexible tools. The declarative approach has been a big help in reducing the amount of code we have to manage, and it's made it easier to support both batch and streaming without stitching together separate systems. Open-sourcing this framework as Spark Declarative Pipelines is a great step for the Spark community." — Brad Turnbaugh, Sr. Data Engineer, 84.51°

About Databricks
Databricks is the Data and AI company. More than 15,000 organizations worldwide — including Block, Comcast, Condé Nast, Rivian, Shell and over 60% of the Fortune 500 — rely on the Databricks Data Intelligence Platform to take control of their data and put it to work with AI. Databricks is headquartered in San Francisco, with offices around the globe and was founded by the original creators of Lakehouse, Apache Spark™, Delta Lake, MLflow, and Unity Catalog. To learn more, follow Databricks on X, LinkedIn and Facebook.

Contact: [email protected]

Logo - https://mma.prnewswire.com/media/1160675/Databricks_Logo.jpg