March 26, 2014
Riak’s overarching design goal is simple: be maximally available. If your data center is on fire, Riak will be the last part of your stack to fail (and hopefully, you’ve purchased an enterprise license, so there’s another cluster in another data center ready to go at all times).
In order to make sure your data can survive server failures, Riak retains multiple copies (replicas) and allows lock-free, uncoordinated updates.
This then open ups the possibility that data will be out of sync across a cluster. Riak manages this issue in three distinct stages: entropy detection, correction, and conflict resolution.
Among the oldest and simplest tools in Riak is Read Repair, which, as its name implies, is triggered when a read request is received. If the server coordinating the operation notices that the servers responsible for the key do not agree on its value, correction is required.
A more recent feature in Riak is Active Anti-Entropy (often shortened to AAE). Effectively, this is the proactive version of read repair and runs in the background. Riak maintains hash trees to monitor for inconsistent data between servers; when divergent values are detected, correction is mandated.
As discussed in the blog post, Clocks Are Bad, Or, Welcome to the Wonderful World of Distributed Systems, automatically determining the “correct” value in the event of a conflict is not simple, and often not possible at the database layer.
Using contextual metadata called vector clocks, Riak will attempt to determine whether one of the discovered values is derived from the other. In that case, it can safely choose the most recent value. This value is written to all servers that have a copy of the data and conflict resolution is not required.
If Riak can’t verify such a causal relationship, things get more difficult.
Riak’s default behavior, is to fall back to server clocks to determine a winner. This can lead to unexpected results if concurrent updates are received but, on the positive side, conflict resolution will not be required.
If Riak is instead configured with
allow_mult=true to protect data integrity, even when independent writes are received, Riak will write both values to the servers as siblings for later conflict resolution.
Conflict resolution means that Riak detects a conflict, can’t resolve it intelligently, and isn’t instructed to resolve it otherwise.
Next time the application attempts to read such a value, instead of receiving one answer, it’s going to receive (at least) two. It is now the application’s responsibility to deal with the conflict and provide a new value back to Riak for future reads.
This can be trivial (pick one value), obvious (merge all values), or tricky (apply business logic and come back with something different).
With Riak 2.0, Basho is introducing Riak Data Types, which are designed to handle conflict resolution automatically. If your data can be formulated as a set or a map (not terribly dissimilar from JSON), Riak can process and resolve the siblings for you when a read request is received.
Many databases, particularly distributed ones, are effectively non-deterministic in the presence of concurrent writes. If they try to enforce consistency on writes, they sacrifice availability and often data integrity. If they don’t enforce consistency, they may rely on server (or worse, client) clocks to pick a winner, if they even have a strategy at all.
Riak is unique in encouraging developers to think about conflict resolution. Why? Because, for large distributed systems, network and server failures are a fact of life. For very large distributed systems, data duplication and inconsistency is unavoidable. Since Riak is designed for scale, it’s better to provide a structure for proper resolution than to pretend conflicts don’t exist.
January 29, 2014
On Fridays, Basho hosts a Hangout to discuss various topics related to Riak and distributed systems. While Basho evangelists and engineers lead these live Hangouts, they also bring in experts from various other companies, including Kyle Kingsbury (Fatcual), Jeremiah Peschka (Brent Ozar Unlimited), and Stuart Halloway (Datomic).
If you haven’t attended a Hangout, we have recorded them all and they are available on the Basho Technologies Youtube Channel. You can also watch each below.
Data Types and Search in Riak 2.0
Featuring Mark Phillips (Director of Community, Basho), Sean Cribbs (Engineer, Basho), Brett Hazen (Engineer, Basho), and Luke Bakken (Client Services Engineer, Basho)
Bucket Types and Configuration
Featuring Tom Santero (Technical Evangelist, Basho), Joe DeVivo (Engineer, Basho), and Jordan West (Engineer, Basho)
Riak 2.0: Security and Conflict Resolution
Featuring John Daily (Technical Evangelist, Basho), Andrew Thompson (Engineer, Basho), Justin Sheehy (CTO, Basho), and Kyle Kingsbury (Factual)
Fun with Java and C Clients
Featuring Seth Thomas (Technical Evangelist, Basho), Brett Hazen (Engineer, Basho), and Brian Roach (Engineer, Basho)
Property Based Testing
Featuring Tom Santero (Technical Evangelist, Basho) and Reid Draper (Engineer, Basho)
Datomic and Riak
Featuring Hector Castro (Technical Evangelist, Basho), Dmitri Zagidulin (Professional Services, Basho), and Stuart Halloway (Datomic)
Featuring John Daily (Technical Evangelist, Basho), David Rusek (Engineer, Basho), and Jeremiah Peschka (Brent Ozar Unlimited)
A Look Back
Featuring John Daily (Technical Evangelist, Basho), Hector Castro (Technical Evangelist, Basho), Andy Gross (Chief Architect, Basho), and Mark Phillips (Director of Community, Basho)
January 7, 2014
Writing an application that can take full advantage of Riak’s robust scaling properties requires a different way of looking at data storage and retrieval. Developers who bring a relational mindset to Riak may create applications that work well with a small data set but start to show strain in production, particularly as the cluster grows.
Thus, this looks at some of the common conceptual challenges.
Riak offers query features such as secondary indexes (2i), MapReduce, and full-text search, but throwing a large quantity of data into Riak and expecting those tools to find whatever you need is setting yourself (and Riak) up to fail. Performance will be poor, especially as you scale.
Reads and writes in Riak should be as fast with ten billion values in storage as with ten thousand.
Key/value operations seem primitive (and they are) but you’ll find they are flexible, scalable, and very fast (and predictably so).
Treat 2i and friends as tools to be applied judiciously, design the main functionality of your application as if they don’t exist, and your software will continue to work at blazing speeds when you have petabytes of data stored across dozens of servers.
Normalizing data is generally a useful approach in a relational database, but unlikely to lead to happy results with Riak.
Riak lacks foreign key constraints and join operations, two vital parts of the normalization story, so reconstructing a single record from multiple objects would involve multiple read requests; certainly possible and fast enough on a small scale, but not ideal for larger requests.
Instead, imagine the performance of your application if most of your requests were a single, trivial read. Preparing and storing the answers to queries you’re going to ask later is a best practice for Riak.
Ducking Conflict Resolution
One of the first hurdles Basho faced when releasing Riak was educating developers on the complexities of eventual consistency and the need to intelligently resolve data conflicts.
Because Riak is optimized for high availability, even when servers are offline or disconnected from the cluster due to network failures, it is not uncommon for two servers to have different versions of a piece of data.
The simplest approach to coping with this is to allow Riak to choose a winner based on timestamps. It can do this more effectively if developers follow Basho’s guidance on sending updates with vector clock metadata to help track causal history, but often concurrent updates cannot be automatically resolved via vector clocks, and trusting server clocks to determine which write was the last to arrive is a terrible conflict resolution method.
Even if your server clocks are magically always in sync, are your business needs well-served by blindly applying the most recent update? Some databases have no alternative but to handle it that way, but we think you deserve better.
Riak 2.0, when installed on new clusters, will default to retaining conflicts and requiring the application to resolve them, but we’re also providing replicated data types to automate conflict resolution on the servers.
If you want to minimize the need for conflict resolution, modeling with as much immutable data as possible is a big win.
For years, functional programmers have been singing the praises of immutable data, and it confers significant advantages when using a distributed data store like Riak.
Most obviously, conflict resolution is dramatically simplified when objects are never updated.
Even in the world of single-server database servers, updating records in place carries costs. Most databases lose all sense of history when data is updated, and it’s entirely possible for two different clients to overwrite the same field in rapid succession leading to unexpected results.
Some data is always going to be mutable, but thinking about the alternative can lead to better design.
SELECT * FROM <table>
A perfectly natural response when first encountering a populated database is to see what’s in it. In a relational database, you can easily retrieve a list of tables and start browsing their records.
As it turns out, this is a terrible idea in Riak.
Riak is optimized for unstructured, opaque data; however, it is not designed to allow for trivial retrieval of lists of buckets (very loosely analogous to tables) and keys.
Doing so can put a great deal of stress on a large cluster and can significantly impact performance.
It’s a rather unusual idea for someone coming from a relational mindset, but being able to algorithmically determine the key that you need for the data you want to retrieve is a major part of the Riak application story.
Because Riak sends multiple copies of your data around the network for every request, values that are too large can clog the pipes, so to speak, causing significant latency problems.
Basho generally recommends 1-4MB objects as a soft cap; larger sizes are possible with careful tuning, however.
For significantly larger objects, Riak CS offers an Amazon S3-compatible (and also OpenStack Swift-compatible) key/value object store that uses Riak under the hood.
Running a Single Server
This is more of an operations anti-pattern, but it is a common misunderstanding of Riak’s architecture.
It is quite common to install Riak in a development environment using its
devrel build target, which creates five full Riak stacks (including Erlang virtual machines) to run on one server to simulate a cluster.
However, running Riak on a single server for benchmarking or production use is counterproductive, regardless of whether you have one stack or five on the box.
It is possible to argue that Riak is more of a database coordination platform than a database itself. It uses Bitcask or LevelDB to persist data to disk, but more importantly, it commonly uses at least 64 such embedded databases in a cluster.
Needless to say, if you run 64 databases simultaneously on a single filesystem you are risking significant I/O and CPU contention unless the environment is carefully tuned (and has some pretty fast disks).
Perhaps more importantly, Riak’s core design goal, its raison d’être, is high availability via data redundancy and related mechanisms. Writing three copies of all your data to a single server is mostly pointless, both contributing to resource contention and throwing away Riak’s ability to survive server failure.
So, Now What?
As always, we recommend visiting Basho’s docs website for more details on how to build and run Riak, and many of our customers have given presentations on their use cases of Riak, including data modeling.
Also, keep an eye on the Basho blog where we provide high-level overviews like this of Riak and the larger non-relational database world.
For a detailed analysis of your needs and modeling options, contact Basho regarding our professional services team.
- Why Riak (docs.basho.com)
- Data Modeling (docs.basho.com)
- Clocks Are Bad, Or, Welcome to the Wonderful World of Distributed Systems (Basho blog)
- A Little Riak Book