Optimize Analytical Processing White Paper - Ventana Research

Services for Organizations

Using our research, best practices and expertise, we help you understand how to optimize your business processes using applications, information and technology. We provide advisory, education, and assessment services to rapidly identify and prioritize areas for improvement and perform vendor selection

Consulting & Strategy Sessions

Ventana On Demand

Services for Investment Firms

We provide guidance using our market research and expertise to significantly improve your marketing, sales and product efforts. We offer a portfolio of advisory, research, thought leadership and digital education services to help optimize market strategy, planning and execution.

Consulting & Strategy Sessions

Ventana On Demand

Services for Technology Vendors

We provide guidance using our market research and expertise to significantly improve your marketing, sales and product efforts. We offer a portfolio of advisory, research, thought leadership and digital education services to help optimize market strategy, planning and execution.

Analyst Relations

Demand Generation

Product Marketing

Market Coverage

Request a Briefing

Different Designs for Different Functions

Apache Spark and massively parallel processing (MPP) analytical databases are designed for different things. The first generation of “big data” architectures relied upon the distributed Hadoop and MapReduce framework for analytical processing. This framework provided a breakthrough in that it increased the amount of data that could be processed, but it operated in batch mode which limited its applicability for interactive analyses. Spark removed the batch processing limitation of MapReduce thus making interactive analyses on big data practical. It also provided capabilities for streaming analyses and machine learning, but it does not include its own persistent storage layer.

Distributed MPP systems are designed for scalable, high-performance analytical database operations. These database systems spread processing across multiple compute resources to provide scalability and enhance performance while maintaining transactional consistency with support for data updates and deletes. Many applications require transactional consistency or repeatability—for example, customer billing or financial systems—that the relational database technology underlying MPP systems provides. These systems also use a variety of optimization techniques to deliver very high performance when executing a wide variety of analyses, including those involving small numbers of records or very large numbers of records. And while the best implementations of MPP systems are not limited to only SQL processing, the wide availability of SQL skills and tools make it easier to deploy and integrate them into an organization’s information architecture.