Improving the quality of information is cited by organizations as the leading benefit of data preparation activities. Data quality efforts are focused on clean data, but increasingly, the importance of bad data is also recognized. To be more accurate, the original data as recorded by an organization’s various devices and systems is important. To fully perform data preparation, organizations must know what data exists – both good and bad.
Our Data Preparation Benchmark Research also shows that organizations’ most frequent concern with data preparation processes is that they are not flexible or adaptable to change. There are many reasons why organizations need to be more flexible in data processes. Data sources and targets change. Organizations transition to new data and analytic tools and migrate systems to the cloud. Lines of business adopt new processes – either to gain a leg up on the competition or to respond to competitive pressures. Markets change. Mergers and acquisitions occur. DataOps has emerged as an approach to address these issues while still maintaining the appropriate reliability and governance that organizations require.
Trifacta, a San Francisco-based company that offers cloud-first capabilities to help organizations enable analytics transformation at scale, has developed a cloud platform to profile, prepare and pipeline data for analytics and machine learning. A low-code/no-code approach increases the accessibility of data preparation tools and data governance for organizations that do not currently have dedicated developers.
The vendor’s opportunity to address analytic needs was underscored when Alteryx announced it would acquire Trifacta in January 2022. The combination of Alteryx’s low-code/no-code analytics approach with Trifacta’s cloud-first capabilities delivers a range of deployment options spanning on-premises, hybrid and cloud.
“Long known for its self-service capabilities for automating data access, preparation, analytics and data science processes, Alteryx has steadily enhanced and expanded its platform, making it even more accessible and attractive to a wider base of customers,” said David Menninger, SVP and research director, Ventana Research. “The combination with Trifacta’s cloud-native platform will improve its appeal to data engineers and IT teams, while adding important capabilities to accelerate adoption of its new, cloud-based product portfolio.”
Cloud integration for the Trifacta platform exists with a variety of environments, including Amazon Web Services, Databricks, Google Cloud, Microsoft Azure, Snowflake and even multi-cloud options. Support for on-premises data and independence from platform-specific storage schema, execution environment or security framework allows for adapting to changing business requirements without rewriting code to connect a new data warehouse or data lake.
In 2019, Ventana Research named Trifacta customer IQVIA the Overall Digital Leadership Award winner. The healthcare organization applied the Trifacta platform to accelerate the pace of discovery and improve outcomes for finding the right patients for the ideal clinical trial. The end-user organization expressed that its data analysts were not effective in their reporting due to the limitations of spreadsheets and existing data tools. Organizations in this situation seek to streamline disparate datasets and understand the value of updating data processes and data governance.
We assert that by 2025, more than three-quarters of organizations’ data integration processes will be enhanced with artificial intelligence andmachine learning to increase automation, accuracy, agility and speed. Trifacta does just that; applying artificial intelligence and machine learning to the collected data, and recommending how the data should be brought together as well as data quality corrections. The data set is cleaned for quality before it is presented for analysis. Quick access to data is also desirable, and this is accomplished by defining the data transformation processes once and reapplying them for each data pipeline.
Organizations working increasingly with data management should encapsulate these efforts in data governance: eliminating manual processes, creating repeatable steps and enabling collaboration in the processes supporting good data governance practices. Data quality functionality – data profiling, highlighting of anomalies and automating data cleaning – is another aspect of the Trifacta platform that aids an organization’s approach to data governance.
Two primary personas who benefit from a structured approach to data preparation and data governance into an organization’s data management strategy are line-of-business managers and IT. The line-of-business manager will appreciate the ease with which data preparation occurs and the accessibility of data to more information workers. In contrast, IT is going to seek an architecture that is synergistic with the organization’s existing systems, tools and processes.
Data engineering is an important task in any organization. Technology approaches that help organizations do something optimally and efficiently are worth considering. If software can create new value for the workforce and customers, eliminate manual steps, catch anomalies and improve data quality or figure out how to process the data, then it speeds up data preparation and provides value. Organizations looking for a third-party vendor with these cloud-first capabilities should evaluate Trifacta by Alteryx and its approach to data preparation and data governance.