Apache Spark
Integration

The Apache Spark Connector for Riak automatically synchronizes data between Spark and Riak. This combines the in-memory analytics of Apache Spark with the resiliency and scale of Riak.

Write it like Riak. Analyze it like Spark.

Modern Big Data applications need to process data in real time to reveal patterns, trends, and associations. The Apache Spark Connector for Riak moves data from Riak to Spark for in-memory analysis, plus the results can be stored back in Riak for future data processing.

Why Apache Spark and Riak?

Apache Spark is an analytics framework for Big Data. Riak is built to store Big Data in a distributed NoSQL database that is designed for massive scalability, high availability, and ease of operations. Apache Spark integrated with Riak provides the real-time analytics of Spark with the availability and scalability of Riak. This makes real-time analytics of unstructured data possible. Until Spark came along, no single processing framework could handle the load, required by a distributed system.

Apache Spark Integration Resources:

SPARK FEATURES

Fast Data Mover

Add Spark to your Riak data.
Intelligently load data into Spark clusters to minimize network traffic and processing overhead.

Write-Back to Riak

Persistence made simple.
Store intermediate and final results back into Riak KV for further processing by Spark or other Big Data application components.

Performance at Scale

Satisfy the need for speed.
Apache Spark Add-on is architected for high performance, real-time analysis and Riak KV persistence of Big Data.

Application Simplicity

Don’t do it yourself.
Integrate and update real-time analytics, caching and search technologies to simplify the design and operations of Big Data Applications.