Services for Organizations

Using our research, best practices and expertise, we help you understand how to optimize your business processes using applications, information and technology. We provide advisory, education, and assessment services to rapidly identify and prioritize areas for improvement and perform vendor selection

Consulting & Strategy Sessions

Ventana On Demand

    Services for Investment Firms

    We provide guidance using our market research and expertise to significantly improve your marketing, sales and product efforts. We offer a portfolio of advisory, research, thought leadership and digital education services to help optimize market strategy, planning and execution.

    Consulting & Strategy Sessions

    Ventana On Demand

      Services for Technology Vendors

      We provide guidance using our market research and expertise to significantly improve your marketing, sales and product efforts. We offer a portfolio of advisory, research, thought leadership and digital education services to help optimize market strategy, planning and execution.

      Analyst Relations

      Demand Generation

      Product Marketing

      Market Coverage

      Request a Briefing


        White Paper

        To keep reading or download the pdf

        Fill out the Form

         

        Read Time:

        10 min.

         

        Sponsored by:

        OpenText Logo

         

        Font Size:

         

        Font Weight:

        Optimize Analytical Processing

        Balance Costs and Performance Between MPP Databases and Apache Spark

        Different Designs for Different Functions

        Apache Spark and massively parallel processing (MPP) analytical databases are designed for different things. The first generation of “big data” architectures relied upon the distributed Hadoop and MapReduce framework for analytical processing. This framework provided a breakthrough in that it increased the amount of data that could be processed, but it operated in batch mode which limited its applicability for interactive analyses. Spark removed the batch processing limitation of MapReduce thus making interactive analyses on big data practical. It also provided capabilities for streaming analyses and machine learning, but it does not include its own persistent storage layer.

        Distributed MPP systems are designed for scalable, high-performance analytical database operations. These database systems spread processing across multiple compute resources to provide scalability and enhance performance while maintaining transactional consistency with support for data updates and deletes. Many applications require transactional consistency or repeatability—for example, customer billing or financial systems—that the relational database technology underlying MPP systems provides. These systems also use a variety of optimization techniques to deliver very high performance when executing a wide variety of analyses, including those involving small numbers of records or very large numbers of records. And while the best implementations of MPP systems are not limited to only SQL processing, the wide availability of SQL skills and tools make it easier to deploy and integrate them into an organization’s information architecture.

         
         

        Fill out the form to continue reading