Read Time:
3 min.
Sponsored by:
Font Size:
Font Weight:
Analyst Viewpoint
For many organizations, analytics processes are riddled with roadblocks. There are many valid reasons why these roadblocks existed in the past, and they were often based on technical limitations. Even though storage and compute capacity have increased dramatically, many of the approaches that were designed for the constraints of the past are still in place today. It’s time to rethink analytics processes to minimize or eliminate these roadblocks so that individuals in line-of-business functions can access data to make decisions more easily and more quickly.
Organizations need to process more data more quickly than ever before. Yet our research shows that accessing and preparing data continue to be two of the most time-consuming parts of making data available for analysis. One reason these tasks become time consuming is due to the effort needed to transform the data from its original form into a form that supports analytics processes.
Analytics processes were historically designed to minimize the impact on the source systems because these systems are critical to the operations of the organization. For instance, if the sales system can’t process a sales transaction because it is busy performing analytics tasks, then the organization is likely to lose business as a result. Consequently, analytical systems have traditionally been designed to be separate from operational systems in order to insulate the operational systems. The analytical systems were then tuned for performance using a variety of techniques.
Typically, a specialized database design, such as a star or snowflake schema, was used to minimize the number of joins required for analyses. Aggregates of detailed data were pre-calculated to speed up query response times. Other specialized database structures like indexes and materialized views were also created to speed up queries. These acceleration techniques work, but they require more time on data engineering tasks up-front in order to spend less time executing the queries.
All performance tuning methods create roadblocks. A series of data integration or data preparation jobs are needed to support these structures. These jobs take time to execute and generally run overnight – or even less frequently – making it difficult, or impossible, to analyze data from the same day’s operations. Roadblock. In addition, these jobs require some significant amount of effort to build and therefore care must be taken when they need to be modified. Roadblock. The analytical system or database must be maintained and tuned as well, especially as data volumes grow. Roadblock.
In an ideal world, the data in the analytical system would match the data in the operational system. Data would flow directly from one to the other without any need for transformation or modification. If this were the case, it would dramatically speed up the process of accessing and preparing data. It would also allow security and governance policies from the operational systems to flow directly into the analytical systems.
The good news is that there’s no “magic” here. A direct flow of data from operational systems can be accomplished if the process of optimizing the analytical system is performed behind the scenes. This can be done today. There is enough knowledge about how to transform data for analytical purposes and how to achieve optimal performance that it can be done automatically. Analytics software vendors today are building tools that apply these techniques, and organizations are thus able to deploy their analytical systems much more easily and quickly. This direct data flow also makes the process of updating the analytical system with new data much easier.
It’s unrealistic to expect that the data transformations from operational to analytical systems could be entirely automated. For example, the names of tables and columns in operational systems often are not designed to be easily understood by line-of-business personnel. Organizations would likely want to rename some of these columns to make them more understandable. In addition, operational systems may not include some of the metrics needed for analysis. So, analytics tools must include a way to add these metrics to the analytical system. But these manual interventions are minor in comparison to the data engineering roadblocks that exist today in most organizations.
As organizations consider ways to improve their data and analytics processes, they should evaluate vendors that have automated more of these data preparation processes. More automation means fewer roadblocks in their analytics. And look for vendors who go beyond simply automating manually designed data processing jobs. Look for the ability to automate the design process as well. By removing the roadblocks to analytics, organizations can make faster, more informed decisions, thus allowing them to realize the full value of their data.
Analyst Viewpoint
For many organizations, analytics processes are riddled with roadblocks. There are many valid reasons why these roadblocks existed in the past, and they were often based on technical limitations. Even though storage and compute capacity have increased dramatically, many of the approaches that were designed for the constraints of the past are still in place today. It’s time to rethink analytics processes to minimize or eliminate these roadblocks so that individuals in line-of-business functions can access data to make decisions more easily and more quickly.
Organizations need to process more data more quickly than ever before. Yet our research shows that accessing and preparing data continue to be two of the most time-consuming parts of making data available for analysis. One reason these tasks become time consuming is due to the effort needed to transform the data from its original form into a form that supports analytics processes.
Analytics processes were historically designed to minimize the impact on the source systems because these systems are critical to the operations of the organization. For instance, if the sales system can’t process a sales transaction because it is busy performing analytics tasks, then the organization is likely to lose business as a result. Consequently, analytical systems have traditionally been designed to be separate from operational systems in order to insulate the operational systems. The analytical systems were then tuned for performance using a variety of techniques.
Typically, a specialized database design, such as a star or snowflake schema, was used to minimize the number of joins required for analyses. Aggregates of detailed data were pre-calculated to speed up query response times. Other specialized database structures like indexes and materialized views were also created to speed up queries. These acceleration techniques work, but they require more time on data engineering tasks up-front in order to spend less time executing the queries.
All performance tuning methods create roadblocks. A series of data integration or data preparation jobs are needed to support these structures. These jobs take time to execute and generally run overnight – or even less frequently – making it difficult, or impossible, to analyze data from the same day’s operations. Roadblock. In addition, these jobs require some significant amount of effort to build and therefore care must be taken when they need to be modified. Roadblock. The analytical system or database must be maintained and tuned as well, especially as data volumes grow. Roadblock.
In an ideal world, the data in the analytical system would match the data in the operational system. Data would flow directly from one to the other without any need for transformation or modification. If this were the case, it would dramatically speed up the process of accessing and preparing data. It would also allow security and governance policies from the operational systems to flow directly into the analytical systems.
The good news is that there’s no “magic” here. A direct flow of data from operational systems can be accomplished if the process of optimizing the analytical system is performed behind the scenes. This can be done today. There is enough knowledge about how to transform data for analytical purposes and how to achieve optimal performance that it can be done automatically. Analytics software vendors today are building tools that apply these techniques, and organizations are thus able to deploy their analytical systems much more easily and quickly. This direct data flow also makes the process of updating the analytical system with new data much easier.
It’s unrealistic to expect that the data transformations from operational to analytical systems could be entirely automated. For example, the names of tables and columns in operational systems often are not designed to be easily understood by line-of-business personnel. Organizations would likely want to rename some of these columns to make them more understandable. In addition, operational systems may not include some of the metrics needed for analysis. So, analytics tools must include a way to add these metrics to the analytical system. But these manual interventions are minor in comparison to the data engineering roadblocks that exist today in most organizations.
As organizations consider ways to improve their data and analytics processes, they should evaluate vendors that have automated more of these data preparation processes. More automation means fewer roadblocks in their analytics. And look for vendors who go beyond simply automating manually designed data processing jobs. Look for the ability to automate the design process as well. By removing the roadblocks to analytics, organizations can make faster, more informed decisions, thus allowing them to realize the full value of their data.
Fill out the form or log in to continue reading.
David Menninger
Executive Director, Technology Research
David Menninger leads technology software research and advisory for Ventana Research, now part of ISG. Building on over three decades of enterprise software leadership experience, he guides the team responsible for a wide range of technology-focused data and analytics topics, including AI for IT and AI-infused software.