Market Perspectives

ISG Buyers Guide for Data Integration Classifies and Rates Software Providers

Written by ISG Software Research | Dec 26, 2024 1:00:00 PM

ISG Research is happy to share insights gleaned from our latest Buyers Guide, an assessment of how well software providers’ offerings meet buyers’ requirements. The Data Integration: ISG Research Buyers Guide is the distillation of a year of market and product research by ISG Research.

Data integration is a fundamental enabler of a data intelligence strategy. Analysis of individual data sources—customer or product data, for example—can provide insights to improve operational efficiency. However, the combination of data from multiple sources enables enterprises to innovate, improving customer experience and revenue generation, for example, by targeting the most lucrative customers with offers to adopt the latest product.

ISG Research defines data integration as software that enables enterprises to extract data from applications, databases and other sources and combine it for analysis in a data warehouse, including a logical data warehouse or data lakehouse, to generate business insights. Without data integration, business data would be trapped in the applications and systems in which it was generated.

Traditional approaches to data management are rooted in point-to-point batch data processing whereby data is extracted from its source, transformed for a specific purpose and loaded into a target environment for analysis. The transformation could include the normalization, cleansing and aggregation of data. More than two-thirds (69%) of enterprises cite preparing data for analysis as the most time-consuming aspect of the analytics process. Reducing the time and effort spent on data integration and preparation can significantly accelerate time to business insight.

Although point-to-point data integration continues to serve tactical data integration use cases, it is unsuitable for more strategic enterprise-wide data integration initiatives. These require the orchestration of a complex mesh of agile data pipelines that traverse multiple data-processing locations and can evolve in response to changing data sources and business requirements.

Traditional batch extract, transform and load integration products were designed to extract data from a source and transform it in a dedicated staging area before loading it into a target environment (typically a data warehouse or data lake) for analysis. The dedicated ETL staging layers were important to avoid placing an undue transformation processing burden on the target data platform, ensuring that sufficient processing power was available to perform the necessary analytic queries.

Since they are designed for a specific data transformation task, ETL pipelines are often highly efficient. However, they are also rigid, difficult to adapt and ill-suited to continuous and agile processes. As data and business requirements change, ETL pipelines must be rewritten accordingly. The need for greater agility and flexibility to meet the demands of real-time data processing is one reason we have seen increased interest in extract, load and transform data pipelines.

Extract, load and transform pipelines use a more lightweight staging tier, which is required simply to extract data from the source and load it into the target data platform. Rather than a separate transformation stage prior to loading, ELT pipelines make use of pushdown optimization, leveraging the data processing functionality and processing power of the target data platform to transform the data.

Pushing data transformation execution to the target data platform results in a more agile data extraction and loading phase, which is more adaptable to changing data sources. This approach is well aligned with the application of schema-on-read applied in data lake environments, as opposed to the schema-on-write approach in which schema is applied as it is loaded into a data warehouse. Since the data is not transformed before being loaded into the target data platform, data sources can change and evolve without delaying data loading. This potentially enables data analysts to transform data to meet their requirements rather than have dedicated data integration professionals perform the task. As such, many ELT offerings are positioned for use by data analysts and developers rather than IT professionals. This can also reduce delays in deploying business intelligence projects by avoiding the need to wait for data transformation specialists to (re)configure pipelines in response to evolving business intelligence requirements and new data sources.

By 2026, more than three-quarters of enterprises’ information architectures will support ELT patterns to accelerate data processing and maximize the value of large volumes of data. Whereas once there was considerable debate between software providers as to the relative methods of ETL versus ELT, today, many providers offer both approaches and recognize that there are multiple factors that influence whether one approach is more suitable than the other to any individual integration scenario.

Like ETL pipelines, ELT pipelines may also be batch processes. Both can be accelerated by using change data capture techniques. Change data capture is not new but has come into greater focus given the increasing need for real-time data processing. As the name suggests, CDC is the process of capturing data changes. Specifically, in the context of data pipelines, CDC identifies and tracks changes to tables in the source database as data is inserted, updated or deleted. CDC reduces complexity and increases agility by only synchronizing changed data rather than the entire dataset. The data changes can be synchronized incrementally or in a continuous stream.

More recently, we have seen the emergence of the term zero-ETL by some providers offering automated replication of data from the source application, with immediate availability for analysis in the target analytic database. The term zero-ETL, along with some of the marketing around it, implies that users can do away with extraction, transformation and loading of data entirely. That might sound too good to be true, and in many cases it will be.

Removing the need for data transformation can only be met if all the data required for an analytics project is generated by a single source. Many analytics projects rely on combining data from multiple applications. If this is the case, then transformation of the data will be required after loading to integrate and prepare it for analysis. Even if all the data is generated by a single application, the theory that data does not need to be transformed relies on the assumption that schema is strictly enforced when the data is generated. If not, enterprises are likely to need declarative transformations to cleanse and normalize the data for longer-term analytics or data governance requirements. As such, zero-ETL could arguably be seen as a form of ELT that automates extraction and loading and has the potential to remove the need for transformation in some use cases.

Our Data Integration Buyers Guide is designed to provide a holistic view of a software provider’s ability to deliver the combination of functionality to provide a complete view of data integration with either a single product or suite of products. As such, the Data Integration Buyers Guide includes the full breadth of data integration functionality. Our assessment also considered whether the functionality in question was available from a software provider in a single offering or as a suite of products or cloud services.

This Data Integration Buyers Guide evaluates products based on whether the data integration platform enables the integration of real-time data in motion in addition to data at rest as well as the use of artificial intelligence (AI) to automate and enhance data integration, and the availability and depth of functionality to enable enterprises to integrate data with business partners and other external entities. To be included in this Buyers Guide, products must include data pipeline development, deployment and management.

This research evaluates the following software providers that offer products that address key elements of data integration as we define it: Actian, Alibaba Cloud, Alteryx, Amazon Web Services (AWS), Boomi, Cloud Software Group, Confluent, Databricks, Denodo, Fivetran, Google Cloud, Hitachi Vantara, Huawei Cloud, IBM, Informatica, Jitterbit, Matillion, Microsoft, Oracle, Precisely, Qlik, Reltio, Rocket Software, Salesforce, SAP, SAS Institute, SnapLogic, Solace, Syniti, Tray.ai and Workato.

This research-based index evaluates the full business and information technology value of data integration software offerings. We encourage you to learn more about our Buyers Guide and its effectiveness as a provider selection and RFI/RFP tool.

We urge organizations to do a thorough job of evaluating data integration offerings in this Buyers Guide as both the results of our in-depth analysis of these software providers and as an evaluation methodology. The Buyers Guide can be used to evaluate existing suppliers, plus provides evaluation criteria for new projects. Using it can shorten the cycle time for an RFP and the definition of an RFI.

The Buyers Guide for Data Integration in 2024 finds Informatica first on the list, followed by Microsoft and Oracle.

Software providers that rated in the top three of any category including the product and customer experience dimensions earn the designation of Leader.

The Leaders in Product Experience are:

  • Informatica
  • Oracle
  • Microsoft

The Leaders in Customer Experience are:

  • Databricks
  • Microsoft
  • SAP

The Leaders across any of the seven categories are:

  • Oracle, which has achieved this rating in five of the seven categories.
  • Informatica in four categories.
  • Databricks in three categories.
  • Actian and Microsoft in two categories.
  • Denodo, Google Cloud, Qlik, SnapLogic and SAP in one category.

The overall performance chart provides a visual representation of how providers rate across product and customer experience. Software providers with products scoring higher in a weighted rating of the five product experience categories place farther to the right. The combination of ratings for the two customer experience categories determines their placement on the vertical axis. As a result, providers that place closer to the upper-right are “exemplary” and rated higher than those closer to the lower-left and identified as providers of “merit.” Software providers that excelled at customer experience over product experience have an “assurance” rating, and those excelling instead in product experience have an “innovative” rating.

Note that close provider scores should not be taken to imply that the packages evaluated are functionally identical or equally well-suited for use by every enterprise or process. Although there is a high degree of commonality in how organizations handle data integration, there are many idiosyncrasies and differences that can make one provider’s offering a better fit than another.

ISG Research has made every effort to encompass in this Buyers Guide the overall product and customer experience from our data integration blueprint, which we believe reflects what a well-crafted RFP should contain. Even so, there may be additional areas that affect which software provider and products best fit an enterprise’s particular requirements. Therefore, while this research is complete as it stands, utilizing it in your own organizational context is critical to ensure that products deliver the highest level of support for your projects.

You can find more details on our community as well as on our expertise in the research for this Buyers Guide.