Read Time:
10 min.
Font Size:
Font Weight:
Data Integration Considerations for ISVs and Data Providers
Real-Time Data and AI Drive Businesses Today
Data is an extremely valuable asset to almost every organization, and it informs nearly every decision an enterprise makes. It can be used to make better decisions at almost every level of the enterprise—and to make them more quickly. But to take full advantage of the data and to do so quickly requires artificial intelligence (AI). So, it is no surprise that nearly all participants in our research (87%) report that they have enabled or piloted AI features in analytics and business intelligence applications. Today, data is collected in more ways and from more devices and more frequently than ever before. It can enable new methods of doing business and can even create new sources of revenue. In fact, the data and analyses themselves can be a new source of revenue.
Independent software vendors (ISVs) and data providers understand the importance of data in AI-based processes, and they are designing products and services to help enterprises step in and harness all this data and AI-generated business energy. To maximize the opportunities, ISVs and data providers need to recognize that enterprises use various types of data, including data from both internal and external sources. In fact, our research shows that the majority of enterprises (56%) are working with 11 or more sources of data. Governing the various data sources becomes critical because poor quality data leads to poor AI models. Our research shows the top benefit of investing in data governance, reported by three-quarters of participants (77%), is improved data quality.
The most common types of collected data include transactional, financial, customer, IT systems, employee, call center, and supply chain. But there are other sources as well, many external to the enterprise. Nine in 10 enterprises (90%) are working with at least one source of external data, which could mean location data, economic data, social media, market data, consumer demographics government data, and weather data. To be useful, all of that must be integrated.
“Data integration” is the process of bringing together information from various sources across an enterprise to provide a complete, accurate, and real-time set of data that can support operational processes and decision-making. But nearly one-third of enterprises (31%) report that it is hard to access their data sources, and more than two-thirds (69%) report that preparing their data is the activity where they spend the most time in their analytics processes. The process of data integration often places a burden on the operational systems upon which enterprises rely.
At the same time, enterprises also need to be able to integrate applications into their data processes. ISVs and data providers must bring data together with applications so it is easier for enterprises to access and use the very data they provide.
Data Integration Is Not Easy
Simple linkages such as open database connectivity and Java database connectivity (ODBC/JDBC), or even custom-coded scripts, are not sufficient for data integration. While ODBC/JDBC can provide the necessary “plumbing” to access many different data sources, it offers little assistance to application developers in creating agile data pipelines. Simple connectivity also does nothing to assist with consolidating or transforming data to make it ready for analytics, for instance, in a star schema. Nor does simple connectivity provide any assistance in dealing with slowly changing dimensions which must be tracked for many types of AI analyses.
Simple connectivity does little to help enterprises transform the data to ensure its standardization or quality. Data from various sources often contains inconsistencies, for instance in customer reference numbers or product codes. Accurate analyses require that these inconsistencies be resolved as the data is integrated. Similarly, data quality is an issue that must be addressed as the data is integrated. Our research shows these two issues of data quality and consistency are the second most common time sinks in the analytics process.
Nor does simple database connectivity help enterprises effectively integrate data from files, applications or application programming interfaces (APIs). With the proliferation of cloud-based applications, many of which only provide API access, ODBC/JDBC connectivity may not be an option. And many enterprises still need to process flat files of data, as our research shows that these types of files are the second most common source of data for analytics.
Data integration is not a one-time activity, either. It requires the establishment of data pipelines that regularly collect and consolidate updated data. A greater infrastructure is needed around these pipelines to ensure that they run properly and to completion. ISVs and data providers that rely only on simple connectors must create and maintain this extra infrastructure themselves.
Those data pipelines also need to be agile enough to support a variety of styles of integration. Batch updates are still useful for bulk transfers of data, but other more frequent styles of updating are needed as well. Our research shows that nearly one-quarter of enterprises (22%) need to analyze data in real time. Since the most common sources of information are transactional and operational applications, it is important to create pipelines that can access this data as it is generated. Incremental updates and change data capture (CDC) technology can solve this problem and these are becoming competitive necessities.
Real-time requirements are even more demanding when we consider event data, where nearly one-half (47%) of enterprises process it within seconds. Then, as applications and organizational requirements change, the data pipelines must reflect those changes. Therefore, the tools used to support such a wide variety of ever-changing sources need to be open enough to be easily incorporated into a wide variety of processes.
But if ISVs and data providers focus their energies on maintaining data pipelines, it distracts resources from the core business. Creating data pipeline infrastructure that is highly performant and efficient requires years of engineering. Simple bulk movement of entire data sets is slow and inefficient, even though it may be necessary for initial data transfers. Subsequent data transfers, however, should use a data replication scheme or CDC approach, creating much smaller data transfers and much faster processes.
Advantages of a Modern Data Fabric
A modern data fabric is based on a cloud-native architecture and includes orchestration and automation capabilities that enhance the design and execution of data pipelines that consolidate information from across the enterprise. As data becomes a new source of revenue, sometimes referred to as “data as a product,” a modern data fabric must also enable easy access to, and consumption of, data. A key component to delivering data in this fashion is strong data catalog capabilities. AI-assisted search, automated profiling and tagging of data sources, and tracking the lineage of that data through its entire life cycle make it easier to find and understand the data needed for particular operations and analyses. Collecting and sharing this metadata in a data catalog not only provides better understanding and access to the data, but also improves data governance. Our research shows that enterprises that have adequate data catalog technology are three times more likely to be satisfied with their analytics and have achieved greater rates of self-service analytics.
Orchestration and access via APIs are also critical to ISVs and data providers as these allow the remote invocation of data pipelines needed for the coordination and synchronization of various interrelated application processes, even when they are distributed across different cloud applications and services. These APIs need to span all aspects from provisioning to core functionality for orchestration to be effective. Automation of these orchestration tasks can enhance many aspects of data pipelines to make them both more efficient and more agile. Automated data mapping, automated meta data creation and management, schema evolution, automated data mart creation, and data warehouse and data lake automation can quickly and efficiently create analytics-ready data. When combined with orchestration, automation can also provide “reverse integration” to update data in source systems when necessary and appropriate.
Modern data integration platforms employ AI/ML to streamline and improve data processing. AI/ML can be used to automatically detect anomalies in data pipelines, such as whether the pipelines suddenly processed an unusually small number of records. Such an anomaly could indicate a problem somewhere else in the pipeline. AI/ML can also be used to automatically deal with errors in pipelines and routine changes, such as those in the sources or targets. AI/ML can also determine the optimal execution of pipelines, including the number of instances to create or where different portions of the pipeline should be processed. AI/ML can be used to enrich data with predictions, scoring or classifications that help support more accurate decision-making. We assert that by 2027, three-quarters of all data processes will use AI and ML to accelerate the realization of value from the data.
Modern data integration platforms must also incorporate all appropriate capabilities for data governance. Data sovereignty issues may require that data pipelines be executed only within certain geographies. Compliance with internal or regulatory policies may require single sign-on or the use of additional credentials to appropriately track and govern data access and use. Therefore, a platform with built-in capabilities for governance can help identify personally identifiable information and other sensitive or regulated data. But implementing any of these modern data integration platform requirements can impose a significant burden on ISVs and data providers.
Illustrative Use Cases
Product Distributors
For organizations with hundreds of thousands of SKUs and hundreds of thousands of customers, managing orders and inventories can be a time consuming process. Using a modern data-as-a-product approach with standardized data governance and a centralized data catalog can reduce costs dramatically and enable self-service online ordering. This approach also creates more agility to meet customer needs and provides better, more timely visibility into operations.
Insurance technology data providers can use data integration to help their customers be more competitive by providing access to up-to-date information that enables online quotes.
Insurance Industry
Insurance technology data providers can use data integration to help their customers be more competitive by providing access to up-to-date information that enables online quotes. Data is the key to the accurate pricing of insurance liabilities, and many of the sources and targets exist in the cloud, but they require support for a variety of endpoints. By using CDC-based replication, however, both claims and market data can be collected, consolidated, and distributed within minutes. As a result, millions of quotes can be generated each day where each incorporates real-time analysis of vast volumes of data.
Other Applications
Data integration can be the key to many other ISVs and data providers. Mobile application providers can integrate location data with transaction data to provide broader market data on consumer behavior. Talent management ISVs can integrate data relating to internal performance and compensation with external market data to improve employee acquisition and retention. Foreclosure data can be collected, consolidated, and distributed to support loan origination and servicing operations. Vendor data can be collected and provided to improve procurement processes augmenting supplier performance analyses with risk, diversity, sustainability and credit scores. And regardless of the vertical industry or line-of-business function, faster access to more data generally produces better results.
Other Considerations
Once data is integrated, it can provide the basis for a broad range of analytics and AI. By supporting these analyses and data science, ISVs and data providers can extend the value of their capabilities and therefore increase their revenue opportunities. Choosing a data integration platform that also supports analytics and AI will make it easier for enterprises to capture this revenue. In fact, our research shows that reports and dashboards are the most common types of analytics used by more than 80% of enterprises. However, when considering analytics providers, look at those that support other newer techniques as well, such as AI/ML and natural language processing, which are projected to be required by 80% of enterprises in the future.
Enterprises need to use data to help drive actions. Data can help them understand what has happened and why, but they ultimately need to process what they have learned and then take action. In many situations, however, there is simply no time to review data to determine what course of action to take. ISVs and data providers can help their customers derive more value from data by using real-time information to trigger the appropriate actions.
ISVs and data providers are using technology to add value to business processes. While all business processes typically require data, data integration itself is merely a means to the end. If the process is not done properly, it can detract from the overall approach, so it requires careful design and development. Enterprises should ideally spend their time on core competencies, not on developing data integration technology. By using a full-featured, purpose-built data integration platform, they can ensure that the data needed by ISVs and data providers is robust and available in a timely manner.
Next Steps
- Explore all available data sources, along with their accessibility, that can boost the value of your services.
- Recognize the value of data catalog and data governance in enabling data-as-a-product.
- Consider platforms that go beyond simple connections to data sources and that minimize the amount of development and maintenance work required.
- To maximize performance and minimize the impact on production systems, create repeatable and agile pipelines that operate efficiently.
- Look for platforms with significant automation capabilities to maximize productivity and responsiveness.
- Ensure that your architecture provides a modern, cloud-native approach.
About Ventana Research
Ventana Research, now part of Information Services Group, provides authoritative market research and coverage on the business and IT aspects of the software industry. We distribute research and insights daily through the Ventana Research community, and we provide a portfolio of consulting, advisory, research and education services for enterprises, software and service providers, and investment firms. Our premiere service, Ventana On-Demand (VOD), provides structured education and advisory support with subject-matter expertise and experience in the software industry. Ventana Research Buyers Guides support the RFI/RFP process and help enterprises assess, evaluate and select software providers through tailored Assessment Services and our Value Index methodology. Visit www.ventanaresearch.com to sign up for free community membership with access to our research and insights.
Data Integration Considerations for ISVs and Data Providers
Real-Time Data and AI Drive Businesses Today
Data is an extremely valuable asset to almost every organization, and it informs nearly every decision an enterprise makes. It can be used to make better decisions at almost every level of the enterprise—and to make them more quickly. But to take full advantage of the data and to do so quickly requires artificial intelligence (AI). So, it is no surprise that nearly all participants in our research (87%) report that they have enabled or piloted AI features in analytics and business intelligence applications. Today, data is collected in more ways and from more devices and more frequently than ever before. It can enable new methods of doing business and can even create new sources of revenue. In fact, the data and analyses themselves can be a new source of revenue.
Independent software vendors (ISVs) and data providers understand the importance of data in AI-based processes, and they are designing products and services to help enterprises step in and harness all this data and AI-generated business energy. To maximize the opportunities, ISVs and data providers need to recognize that enterprises use various types of data, including data from both internal and external sources. In fact, our research shows that the majority of enterprises (56%) are working with 11 or more sources of data. Governing the various data sources becomes critical because poor quality data leads to poor AI models. Our research shows the top benefit of investing in data governance, reported by three-quarters of participants (77%), is improved data quality.
The most common types of collected data include transactional, financial, customer, IT systems, employee, call center, and supply chain. But there are other sources as well, many external to the enterprise. Nine in 10 enterprises (90%) are working with at least one source of external data, which could mean location data, economic data, social media, market data, consumer demographics government data, and weather data. To be useful, all of that must be integrated.
“Data integration” is the process of bringing together information from various sources across an enterprise to provide a complete, accurate, and real-time set of data that can support operational processes and decision-making. But nearly one-third of enterprises (31%) report that it is hard to access their data sources, and more than two-thirds (69%) report that preparing their data is the activity where they spend the most time in their analytics processes. The process of data integration often places a burden on the operational systems upon which enterprises rely.
At the same time, enterprises also need to be able to integrate applications into their data processes. ISVs and data providers must bring data together with applications so it is easier for enterprises to access and use the very data they provide.
Data Integration Is Not Easy
Simple linkages such as open database connectivity and Java database connectivity (ODBC/JDBC), or even custom-coded scripts, are not sufficient for data integration. While ODBC/JDBC can provide the necessary “plumbing” to access many different data sources, it offers little assistance to application developers in creating agile data pipelines. Simple connectivity also does nothing to assist with consolidating or transforming data to make it ready for analytics, for instance, in a star schema. Nor does simple connectivity provide any assistance in dealing with slowly changing dimensions which must be tracked for many types of AI analyses.
Simple connectivity does little to help enterprises transform the data to ensure its standardization or quality. Data from various sources often contains inconsistencies, for instance in customer reference numbers or product codes. Accurate analyses require that these inconsistencies be resolved as the data is integrated. Similarly, data quality is an issue that must be addressed as the data is integrated. Our research shows these two issues of data quality and consistency are the second most common time sinks in the analytics process.
Nor does simple database connectivity help enterprises effectively integrate data from files, applications or application programming interfaces (APIs). With the proliferation of cloud-based applications, many of which only provide API access, ODBC/JDBC connectivity may not be an option. And many enterprises still need to process flat files of data, as our research shows that these types of files are the second most common source of data for analytics.
Data integration is not a one-time activity, either. It requires the establishment of data pipelines that regularly collect and consolidate updated data. A greater infrastructure is needed around these pipelines to ensure that they run properly and to completion. ISVs and data providers that rely only on simple connectors must create and maintain this extra infrastructure themselves.
Those data pipelines also need to be agile enough to support a variety of styles of integration. Batch updates are still useful for bulk transfers of data, but other more frequent styles of updating are needed as well. Our research shows that nearly one-quarter of enterprises (22%) need to analyze data in real time. Since the most common sources of information are transactional and operational applications, it is important to create pipelines that can access this data as it is generated. Incremental updates and change data capture (CDC) technology can solve this problem and these are becoming competitive necessities.
Real-time requirements are even more demanding when we consider event data, where nearly one-half (47%) of enterprises process it within seconds. Then, as applications and organizational requirements change, the data pipelines must reflect those changes. Therefore, the tools used to support such a wide variety of ever-changing sources need to be open enough to be easily incorporated into a wide variety of processes.
But if ISVs and data providers focus their energies on maintaining data pipelines, it distracts resources from the core business. Creating data pipeline infrastructure that is highly performant and efficient requires years of engineering. Simple bulk movement of entire data sets is slow and inefficient, even though it may be necessary for initial data transfers. Subsequent data transfers, however, should use a data replication scheme or CDC approach, creating much smaller data transfers and much faster processes.
Advantages of a Modern Data Fabric
A modern data fabric is based on a cloud-native architecture and includes orchestration and automation capabilities that enhance the design and execution of data pipelines that consolidate information from across the enterprise. As data becomes a new source of revenue, sometimes referred to as “data as a product,” a modern data fabric must also enable easy access to, and consumption of, data. A key component to delivering data in this fashion is strong data catalog capabilities. AI-assisted search, automated profiling and tagging of data sources, and tracking the lineage of that data through its entire life cycle make it easier to find and understand the data needed for particular operations and analyses. Collecting and sharing this metadata in a data catalog not only provides better understanding and access to the data, but also improves data governance. Our research shows that enterprises that have adequate data catalog technology are three times more likely to be satisfied with their analytics and have achieved greater rates of self-service analytics.
Orchestration and access via APIs are also critical to ISVs and data providers as these allow the remote invocation of data pipelines needed for the coordination and synchronization of various interrelated application processes, even when they are distributed across different cloud applications and services. These APIs need to span all aspects from provisioning to core functionality for orchestration to be effective. Automation of these orchestration tasks can enhance many aspects of data pipelines to make them both more efficient and more agile. Automated data mapping, automated meta data creation and management, schema evolution, automated data mart creation, and data warehouse and data lake automation can quickly and efficiently create analytics-ready data. When combined with orchestration, automation can also provide “reverse integration” to update data in source systems when necessary and appropriate.
Modern data integration platforms employ AI/ML to streamline and improve data processing. AI/ML can be used to automatically detect anomalies in data pipelines, such as whether the pipelines suddenly processed an unusually small number of records. Such an anomaly could indicate a problem somewhere else in the pipeline. AI/ML can also be used to automatically deal with errors in pipelines and routine changes, such as those in the sources or targets. AI/ML can also determine the optimal execution of pipelines, including the number of instances to create or where different portions of the pipeline should be processed. AI/ML can be used to enrich data with predictions, scoring or classifications that help support more accurate decision-making. We assert that by 2027, three-quarters of all data processes will use AI and ML to accelerate the realization of value from the data.
Modern data integration platforms must also incorporate all appropriate capabilities for data governance. Data sovereignty issues may require that data pipelines be executed only within certain geographies. Compliance with internal or regulatory policies may require single sign-on or the use of additional credentials to appropriately track and govern data access and use. Therefore, a platform with built-in capabilities for governance can help identify personally identifiable information and other sensitive or regulated data. But implementing any of these modern data integration platform requirements can impose a significant burden on ISVs and data providers.
Illustrative Use Cases
Product Distributors
For organizations with hundreds of thousands of SKUs and hundreds of thousands of customers, managing orders and inventories can be a time consuming process. Using a modern data-as-a-product approach with standardized data governance and a centralized data catalog can reduce costs dramatically and enable self-service online ordering. This approach also creates more agility to meet customer needs and provides better, more timely visibility into operations.
Insurance technology data providers can use data integration to help their customers be more competitive by providing access to up-to-date information that enables online quotes.
Insurance Industry
Insurance technology data providers can use data integration to help their customers be more competitive by providing access to up-to-date information that enables online quotes. Data is the key to the accurate pricing of insurance liabilities, and many of the sources and targets exist in the cloud, but they require support for a variety of endpoints. By using CDC-based replication, however, both claims and market data can be collected, consolidated, and distributed within minutes. As a result, millions of quotes can be generated each day where each incorporates real-time analysis of vast volumes of data.
Other Applications
Data integration can be the key to many other ISVs and data providers. Mobile application providers can integrate location data with transaction data to provide broader market data on consumer behavior. Talent management ISVs can integrate data relating to internal performance and compensation with external market data to improve employee acquisition and retention. Foreclosure data can be collected, consolidated, and distributed to support loan origination and servicing operations. Vendor data can be collected and provided to improve procurement processes augmenting supplier performance analyses with risk, diversity, sustainability and credit scores. And regardless of the vertical industry or line-of-business function, faster access to more data generally produces better results.
Other Considerations
Once data is integrated, it can provide the basis for a broad range of analytics and AI. By supporting these analyses and data science, ISVs and data providers can extend the value of their capabilities and therefore increase their revenue opportunities. Choosing a data integration platform that also supports analytics and AI will make it easier for enterprises to capture this revenue. In fact, our research shows that reports and dashboards are the most common types of analytics used by more than 80% of enterprises. However, when considering analytics providers, look at those that support other newer techniques as well, such as AI/ML and natural language processing, which are projected to be required by 80% of enterprises in the future.
Enterprises need to use data to help drive actions. Data can help them understand what has happened and why, but they ultimately need to process what they have learned and then take action. In many situations, however, there is simply no time to review data to determine what course of action to take. ISVs and data providers can help their customers derive more value from data by using real-time information to trigger the appropriate actions.
ISVs and data providers are using technology to add value to business processes. While all business processes typically require data, data integration itself is merely a means to the end. If the process is not done properly, it can detract from the overall approach, so it requires careful design and development. Enterprises should ideally spend their time on core competencies, not on developing data integration technology. By using a full-featured, purpose-built data integration platform, they can ensure that the data needed by ISVs and data providers is robust and available in a timely manner.
Next Steps
- Explore all available data sources, along with their accessibility, that can boost the value of your services.
- Recognize the value of data catalog and data governance in enabling data-as-a-product.
- Consider platforms that go beyond simple connections to data sources and that minimize the amount of development and maintenance work required.
- To maximize performance and minimize the impact on production systems, create repeatable and agile pipelines that operate efficiently.
- Look for platforms with significant automation capabilities to maximize productivity and responsiveness.
- Ensure that your architecture provides a modern, cloud-native approach.
About Ventana Research
Ventana Research, now part of Information Services Group, provides authoritative market research and coverage on the business and IT aspects of the software industry. We distribute research and insights daily through the Ventana Research community, and we provide a portfolio of consulting, advisory, research and education services for enterprises, software and service providers, and investment firms. Our premiere service, Ventana On-Demand (VOD), provides structured education and advisory support with subject-matter expertise and experience in the software industry. Ventana Research Buyers Guides support the RFI/RFP process and help enterprises assess, evaluate and select software providers through tailored Assessment Services and our Value Index methodology. Visit www.ventanaresearch.com to sign up for free community membership with access to our research and insights.
Fill out the form to continue reading
ISG Software Research
ISG Software Research is the most authoritative and respected market research and advisory services firm focused on improving business outcomes through optimal use of people, processes, information and technology. Since our beginning, our goal has been to provide insight and expert guidance on mainstream and disruptive technologies. In short, we want to help you become smarter and find the most relevant technology to accelerate your organization's goals.
About ISG Software Research
ISG Software Research provides expert market insights on vertical industries, business, AI and IT through comprehensive consulting, advisory and research services with world-class industry analysts and client experience. Our ISG Buyers Guides offer comprehensive ratings and insights into technology providers and products. Explore our research at www.isg-research.net.
About ISG Research
ISG Research provides subscription research, advisory consulting and executive event services focused on market trends and disruptive technologies driving change in business computing. ISG Research delivers guidance that helps businesses accelerate growth and create more value. For more information about ISG Research subscriptions, please email contact@isg-one.com.
About ISG
ISG (Information Services Group) (Nasdaq: III) is a leading global technology research and advisory firm. A trusted business partner to more than 900 clients, including more than 75 of the world’s top 100 enterprises, ISG is committed to helping corporations, public sector organizations, and service and technology providers achieve operational excellence and faster growth. The firm specializes in digital transformation services, including AI and automation, cloud and data analytics; sourcing advisory; managed governance and risk services; network carrier services; strategy and operations design; change management; market intelligence and technology research and analysis. Founded in 2006 and based in Stamford, Conn., ISG employs 1,600 digital-ready professionals operating in more than 20 countries—a global team known for its innovative thinking, market influence, deep industry and technology expertise, and world-class research and analytical capabilities based on the industry’s most comprehensive marketplace data.
For more information, visit isg-one.com.