The development, testing and deployment of data pipelines is essential to generating intelligence from data. Just as a physical pipeline is used to transport water between turbines, generators and transformers in the generation of hydroelectric power, so data pipelines are used to transport data between the stages involved in data processing and analytics to generate business insight. Healthy data pipelines are necessary to ensure data is integrated and processed in the sequence required to generate business intelligence (BI).
The concept of the data pipeline is nothing new, but it is becoming increasingly important as organizations adapt data management processes to be more data driven. Data pipelines have traditionally involved batch extract, transform and load processes, but data-driven processes require more agile, continuous data processing as part of a DataOps approach to data management, with an increased focus on extract, load and transform (ELT) processes, as well as change data capture and automation and orchestration.
Data-driven organizations are increasingly thinking of the steps involved in extracting, integrating, aggregating, preparing, transforming and loading data as a continual process that is orchestrated to facilitate data-driven analytics. By 2026, three-quarters of organizations will adopt data engineering processes that span data integration, transformation and preparation, producing repeatable data pipelines that create more agile information architectures.
The need for more agile data pipelines is driven by the need for real-time data processing. Almost a quarter (22%) of organizations who participated in Ventana Research’s Analytics and Data Benchmark Research are currently analyzing data in real time, with an additional 10% analyzing data every hour. More frequent data analysis requires data to be integrated, cleansed, enriched, transformed and processed for analysis in a continuous and agile process.
Traditional batch extract, transform and load data pipelines are ill-suited to continuous and agile processes. These pipelines were designed to extract data from a source (typically a database supporting an operational application), transform it in a dedicated staging area, and then load it into a target environment (typically a data warehouse or data lake) for analysis.
Exact, transfer and load (ETL) pipelines can be automated and orchestrated to reduce manual intervention. However, since they are designed for a specific data transformation task, ETL pipelines are rigid and difficult to adapt. As data and business requirements change, ETL pipelines need to be rewritten accordingly.
The need for greater agility and flexibility to meet the demands of real-time data processing is one reason we have seen increased interest in ELT pipelines. These pipelines involve the use of a more lightweight staging tier, which is required simply to extract data from the source and load it into the target data platform. Rather than a separate transformation stage prior to loading as with an ETL pipeline, ELT pipelines make use of pushdown optimization, leveraging the data processing functionality and processing power of the target data platform to transform the data.
Pushing data transformation execution to the target data platform results in a more agile data extraction and loading phase, which is more adaptable to changing data sources. This approach is well-aligned with the application of schema-on-read applied in data lake environments, as opposed to the schema-on-write approach in which a schema is applied as it is loaded into a data warehouse.
Since the data is not transformed before being loaded into the target data platform, data sources can change and evolve without delaying data loading. This potentially enables data analysts to transform data to meet their requirements rather than have dedicated data integration professionals perform the task.
As such, many ELT offerings are positioned for use by data analysts and developers rather than IT professionals. This can also result in reduced delays in deploying business intelligence projects by avoiding the need to wait for data transformation specialists to (re)configure pipelines in response to evolving BI requirements and new data sources.
Like ETL pipelines, ELT pipelines may also be batch processes. Both can be accelerated by using change data capture techniques. Change data capture (CDC) is similarly not new but has come into greater focus given the increasing need for real-time data processing. As the name suggests, CDC is the process of capturing data changes.
Specifically, in the context of data pipelines, CDC identifies and tracks changes to tables in the source database as they are inserted, updated or deleted. CDC reduces complexity and increases agility by only synchronizing changed data rather than the entire dataset. The data changes can be synchronized incrementally or in a continuous stream.
The development, testing and deployment of both ETL and ELT pipelines can be automated and orchestrated to provide further agility by reducing the need for manual intervention. Specifically, the batch extraction of data can be scheduled to occur at regular intervals of a set number of minutes or hours, while the various stages in a data pipeline can be managed as orchestrated workflows using data engineering workflow management platforms.
Data observability also has a complementary role to play in monitoring the health of data pipelines and associated workflows as well as the quality of the data itself. Many products for data pipeline development, testing and deployment also offer functionality for monitoring and managing pipelines and are integrated with data orchestration and/or observability functionality.
There remains a need for batch ETL pipelines, not least of which to support existing data integration and analytic processes. However, ELT and CDC approaches have a role to play alongside automation and orchestration in increasing data agility, and all organizations are recommended to consider the potential advantages of more agile data pipelines driving BI and transformational change.
This research evaluates the following vendors that offer products that address key elements of data pipelines as we define it: Alteryx, AWS, Astronomer, BMC, Databricks, DataKitchen, dbt Labs, Google, Hitachi Vantara, IBM, Infoworks.io, Matillion, Microsoft, Prefect, Rivery, SAP, StreamSets and Y42.
For over two decades, Ventana Research has conducted market research in a spectrum of areas across business applications, tools and technologies. Ventana Research has designed the Buyers Guide to provide a balanced perspective of vendors and products that is rooted in an understanding of the business requirement in any organization. Utilization of our research methodology and decades of experience enables our Buyers Guide to be an effective method to assess and select technology vendors and products. The findings of this research undertaking contribute to our comprehensive approach to rating vendors in a manner that is based on the assessments completed by an organization.
This Ventana Research Data Pipelines Buyers Guide: is the distillation of over a year of market and product research efforts. It is an assessment of how well vendors’ offerings will address organizations requirements for data pipelines software. The index is structured to support a request for information (RFI) that could be used in the RFP process by incorporating all criteria needed to evaluate, select, utilize and maintain relationships with technology vendors. An effective product and customer experience with a technology vendor can ensure the best long-term relationship and value achieved from a resource and financial investment.
In this Buyers Guide, Ventana Research evaluates the software in seven key categories that are weighted to reflect buyers’ needs based on our expertise and research. Five are product-experience related: Adaptability, Capability, Manageability, Reliability, and Usability. In addition, we consider two customer-experience categories: Validation, and Total Cost of Ownership and Return on Investment (TCO/ROI). To assess functionality, one of the components of capability, we applied the Ventana Research Value Index methodology and blueprint, which links the personas and processes for data pipelines to an organization’s requirements.
The structure of the research reflects our understanding that the effective evaluation of vendors and products involves far more than just examining product features, potential revenue or customers generated from a vendor’s marketing and sales efforts. We believe it is important to take a comprehensive research-based approach, since making the wrong choice of a data pipelines technology can raise the total cost of ownership, lower the return on investment and hamper an organization’s ability to reach its potential performance. In addition, this approach can reduce the project’s development and deployment time and eliminate the risk of relying on a short list of vendors that does not represent a best fit for your organization.
To ensure the accuracy of the information we collected, we asked participating vendors to provide product and company information across the seven product and customer experience categories that, taken together, reflect the concerns of a well-crafted RFI. Ventana Research then validated the information, first independently through our database of product information and extensive web-based research, and then in consultation with the vendors. Most selected vendors also participated in a one-on-one session providing an overview and demonstration, after which we requested they provide additional documentation to support any new input.
Ventana Research believes that an objective review of vendors and products is a critical business strategy for the adoption and implementation of data pipelines software and applications. An organization’s review should include a thorough analysis of both what is possible and what is relevant. We urge organizations to do a thorough job of evaluating data pipelines systems and tools and offer this Buyers Guide as both the results of our in-depth analysis of these vendors and as an evaluation methodology.
We recommend using the Buyers Guide to assess and evaluate new or existing technology vendors for your organization. The market research can be used as an evaluation framework to establish a formal request for information from technology vendors on their products and customer experience and will shorten the cycle time when creating a RFI. The steps listed below provide a process that can facilitate best possible outcomes.
All of the products we evaluated are feature-rich, but not all the capabilities offered by a technology vendor are equally valuable to types of workers or support everything needed to manage products on a continuous basis. Moreover, the existence of too many capabilities may be a negative factor for an organization if it introduces unnecessary complexity. Nonetheless, you may decide that a larger number of features in the product is a plus, especially if some of them match your organization’s established practices or support an initiative that is driving the purchase of new software.
Factors beyond features and functions or vendor assessments may become a deciding factor. For example, an organization may face budget constraints such that the TCO evaluation can tip the balance to one vendor or another. This is where the Value Index methodology and the appropriate category weighting can be applied to determine the best fit of vendors and products to your specific needs.
The research finds IBM atop the list, followed by Microsoft and Alteryx. Companies that place in the top three of a category earn the designation of Leader. IBM and Microsoft have done so in four of the seven
The overall representation of the research below places the rating of the Product Experience and Customer Experience on the x and y axes, respectively, to provide a visual representation and classification of the vendors. Those vendors whose Product Experience have a higher weighted performance to the axis in aggregate of the five product categories place farther to the right, while the performance and weighting for the two Customer Experience categories determines their placement on the vertical axis. In short, vendors that place closer to the upper-right on this chart performed better than those closer to the lower-left.
The research places vendors into one of four overall categories: Assurance, Exemplary, Merit or Innovative. This representation classifies vendors overall weighted performance.
Exemplary: The categorization and placement of vendors in Exemplary (upper right) represent those that performed the best in meeting the overall Product and Customer Experience requirements. The vendors awarded Exemplary are: Alteryx, AWS, Databricks, Google, IBM, Microsoft and SAP.
Innovative: The categorization and placement of vendors in Innovative (lower right) represent those that performed the best in meeting the overall Product Experience requirements but did not achieve the highest levels of requirements in Customer Experience. The vendors awarded Innovative are: DataKitchen and dbt Labs.
Assurance: The categorization and placement of vendors in Assurance (upper left) represent those that achieved the highest levels in the overall Customer Experience requirements but did not achieve the highest levels of Product Experience. The vendors awarded Assurance are: BMC and Matillion.
Merit: The categorization for vendors in Merit (lower left) represent those that did not exceed the median of performance in Customer or Product Experience or surpass the threshold for the other three categories. The vendors awarded Merit are: Astronomer, Hitachi Vantara, Infoworks.io, Prefect, Rivery, StreamSets and Y42.
We warn that close vendor placement proximity should not be taken to imply that the packages evaluated are functionally identical or equally well suited for use by every organization or for a specific process. Although there is a high degree of commonality in how organizations handle data pipelines, there are many idiosyncrasies and differences in how they do these functions that can make one vendor’s offering a better fit than another’s for a particular organization’s needs.
We advise organizations to assess and evaluate vendors based on their requirements and use this research as a reference to their own evaluation of a vendor and products.
The process of researching products to address an organization’s needs should be comprehensive. Our Value Index methodology examines Product Experience and how it aligns with an organization’s life cycle
The research based on the methodology of expertise identified the weighting of Product Experience to 80% or four-fifths of the overall rating. Importance was placed on the categories as follows: Usability (15%), Capability (25%), Reliability (10%), Adaptability (15%) and Manageability (10%). This weighting impacted the resulting overall ratings in this research. IBM, Microsoft and Alteryx were designated Product Experience Leaders as a result of their top-ranked weighted performance. While not Leaders, Databricks, Google, DataKitchen and dbt Labs were found to meet a broad range of enterprise data pipeline development, testing and deployment requirements.
Many organizations will only evaluate capabilities for those in IT or administration, but the research identified the criticality of Usability and Adaptability (both 15% weighting) across a broader set of usage personas that should participate in data pipelines.
The importance of a customer relationship with a vendor is essential to the actual success of the products and technology. The advancement of the Customer Experience and the entire life cycle an organization
Our Value Index methodology weights Customer Experience at 20% of the overall rating, or one-fifth, as it relates to the framework of commitment and value to the vendor-customer relationship. The two evaluation categories are Validation (10%) and TCO/ROI (10%), which are weighted to represent their importance to the overall research.
The vendors that evaluated the highest overall in the aggregated and weighted Customer Experience categories and were deemed to be Leaders are Microsoft, IBM and AWS. These category leaders in Customer Experience best communicate their commitment and dedication to customer needs. Vendors such as SAP, Google, Databricks and BMC were not Overall Leaders, but have a high level of commitment to the customer experience, with A- grades.
Some vendors we evaluated did not have sufficient information available through their website and presentations. While many have customer case studies to promote their success, many lack depth on articulating their commitment to an organizations’ journey to data pipelines. This makes it difficult for organizations to evaluate vendors on the merits of their commitment to customer success. As a result, few of the vendors’ performances evaluated above 80%. As the commitment to a vendor is a continuous investment, the importance of supporting customer experience in a holistic evaluation should be included and not underestimated.
For inclusion in the Ventana Research Data Pipelines Buyers Guide for 2023, a vendor must be in good standing financially and ethically, have at least $10 million in annual or projected revenue verified using independent sources, or have at least 75 employees, and sell products and provide support on at least two continents. The principal source of the relevant business unit’s revenue must be software-related and there must have been at least one major software release in the last 18 months. The vendor must provide a product that supports agile and collaborative data operations and is marketing themselves or products as one of the following: a DataOps tool or platform; a data orchestration tool or platform; a data observability tool or platform. The research is designed to be independent of the specifics of vendor packaging and pricing. To represent the real-world environment in which businesses operate, we include vendors that offer suites or packages of products that may include relevant individual modules or applications. If a vendor is actively marketing, selling and developing a product for the general market and is reflected on its website that it is within the scope of the research, that vendor is automatically evaluated for inclusion.
All vendors that offer relevant data pipelines products and meet the inclusion requirements were invited to participate in the research evaluation process at no cost to them.
We categorize participation as follows:
Complete participation: The following vendors actively participated and provided completed questionnaires and demonstrations to help in our evaluation of their product: None.
Partial participation: The following vendors provided limited information to help in our evaluation: Alteryx, BMC and DataKitchen.
No participation: The following vendors provided no information or did not respond to our request: AWS, Astronomer, Databricks, dbt Labs, Google, Hitachi Vantara, IBM, Infoworks.io, Matillion, Microsoft, Precisely, Prefect, Rivery, SAP, StreamSets and Y42.
Vendors that meet our inclusion criteria but did not completely participate in our Buyers Guide were assessed solely on publicly available information. As this could have a significant impact on their classification and rating, we recommend additional scrutiny when evaluating those vendors.
Vendor |
Product Names |
Version |
Release |
Participation Status |
Alteryx |
Alteryx Analytics Cloud |
August 2023 |
August 2023 |
Partial |
Astronomer |
Astro, Astronomer Software |
8.4 |
May 2023 |
None |
AWS |
Amazon Managed Workflows for Apache Airflow; AWS Glue |
2.5.1; 4.0 |
January 2023 |
None |
BMC |
Control-M |
9.0.21.100 |
May 2023 |
Partial |
Databricks |
Databricks Workflows, Delta Live Tables |
July 2023 |
July 2023 |
None |
DataKitchen |
DataKitchen Platform (DataOps Observability, DataOps TestGen, and DataOps Automation) |
1.1.275; 1.481; 0.2.0 |
July 2023 |
Partial |
Dbt Labs |
dbtCloud |
2 |
April 2023 |
None |
|
Cloud Composer; Cloud Dataprep by Trifacta |
2.3.2; 10.1 |
June; July |
None |
Hitachi Vantara |
Pentaho Data Integration and Analytics |
9.5 |
May 2023 |
None |
IBM |
Cloud Pak for Data |
4.7 |
August 2023 |
None |
Infoworks.io |
Infoworks Platform |
5.4.2 |
May 2023 |
None |
Matillion |
Data Productivity Cloud |
1.71 |
May 2023 |
None |
Microsoft |
Azure Data Factory |
2 (June 2023) |
June 2023 |
None |
Prefect |
PrefectCloud |
2.10.18 |
June 2023 |
None |
Rivery |
Rivery |
May 2023 |
May 2023 |
None |
SAP |
SAP Data Intelligence Cloud |
2023 |
May 2023 |
None |
StreamSets |
StreamSets Platform |
June 2023 |
June 2023 |
None |
Y42 |
Y42 |
2 |
November 2022 |
None |
There is a very large and growing number of vendors in the DataOps software segment. We did not include vendors that, as a result of our research and analysis, did not satisfy the criteria for inclusion in the Buyer’s Guide.
Most of the vendors that did not meet our inclusion criteria were excluded based on size (either revenue and/or number of employees). Inclusion criteria validation was completed to the best of our ability using information publicly available or through our research.
Other vendors were excluded based on product suitability: either their products only addressed the orchestration or observability of data stored in a data platform rather than all upstream and downstream stages of a data pipeline, or at the time of evaluation they did not have a generally available product marketed as a tool or platform for data pipeline development, data orchestration or data observability (although some subsequently now do). Others were excluded based on having no published documentation, making it impossible to evaluate the capabilities of the product.
We did not include vendors that, as a result of our research and analysis, did not satisfy the criteria for inclusion in the Buyers Guide. These are listed below as “Vendors of Note.”
Vendor |
Product |
At least |
At least 75 employees |
Product suitability |
Documentation |
Ascend |
Ascend Data Automation Cloud |
No |
No |
Yes |
Yes |
Datafold |
Datafold |
No |
No |
Yes |
Yes |
DataOps.live |
Dataops.live |
No |
No |
Yes |
Yes |
Datorios |
Datorios |
No |
No |
Yes |
Yes |
Elementl |
Dagster |
No |
No |
Yes |
Yes |
FirstEigen |
DataBuck |
No |
No |
Yes |
Yes |
Great Expectations (fka Superconductive) |
Great Expectations |
No |
No |
Yes |
Yes |
Integrate.io |
Data Observability |
No |
No |
No |
Yes |
Kleene |
Kleene |
No |
No |
Yes |
Yes |
Meltano |
Meltano |
No |
No |
Yes |
Yes |
Metaplane |
Metaplane |
No |
No |
Yes |
Yes |
Mozart Data |
Mozart Data |
No |
No |
Yes |
Yes |
Nexla |
Nexla |
No |
No |
Yes |
Yes |
Palantir |
Foundry |
Yes |
Yes |
No |
Yes |
Promethium |
Promethium |
No |
No |
Yes |
Yes |
RightData |
Dextrus, RDt |
No |
No |
Yes |
Yes |
Saturam |
Qualdo, Piperr |
No |
Yes |
Yes |
No |
Shipyard |
Shipyard |
No |
No |
Yes |
Yes |
Switchboard Software |
Data Automation |
No |
No |
Yes |
Yes |
Torana |
iceDQ |
No |
Yes |
Yes |
No |
UpSolver |
Upsolver |
No |
No |
Yes |
Yes |