Tag Archives: eventual consistency

Relational to Riak – Tradeoffs

November 18, 2013

This series of blog posts will discuss how Riak differs from traditional relational databases. For more information about any of the points discussed, download our technical overview, “From Relational to Riak.” The previous post in the series discussed High Availability and Cost of Scale.

Eventual Consistency

In order to provide high availability, which is a cornerstone of Riak’s value proposition, the database stores several copies of each key/value pair.

This availability requirement leads to a fundamental tradeoff: in order to continue to serve requests in the presence of failure, we do not force all data in the cluster to stay in sync. Riak will allow writes and reads no matter how many servers (and their stored replicas) are offline or otherwise unreachable.

(Incidentally, this lack of strong coordination has another consequence beyond high availability: Riak is a very, very fast database.)

Riak does provide both active and passive self-healing mechanisms to minimize the window of time during which two servers may have different versions of data.

The concept of eventual consistency may seem unfamiliar, but if you’ve ever implemented a cache or used DNS, those are common examples of the idea. In a large enough system, it’s effectively the default state of all data.

However, with the forthcoming release of Riak 2.0, operators will be able to designate selected pieces of data to require coordination and maintain strong consistency over high availability. Writing such data will be slower and subject to failure if too many servers are unreachable, but the overall robust architecture of Riak will still provide a fast, highly available solution.

Data Modeling

Riak stores data using a simple key/value model, which offers developers tremendous flexibility to define access models that suit their applications. It is also content-agnostic, so developers can store arbitrary data in any convenient format.

Instead of forcing application-specific data structures to be mapped into (and out of) a relational database, they can simply be serialized and dropped directly into Riak. For records that will be frequently updated, if some of the fields are immutable and some aren’t, we recommend keeping the immutable data in one key/value pair and the rest organized into a single or multiple objects based on update patterns.

Relational databases are ingrained habits for many of us, but moving beyond them can be liberating. Further information about data modeling, including sample configurations, are available on Use Cases section of the documentation.


One tradeoff with this simpler data model is that there is no SQL or SQL-like language with which to query the data.

To achieve optimal performance, it is advisable to take advantage of the flexibility of the key/value model to define simple retrieval patterns. In other words, determine the most useful queries and write the results of those queries as the data is being processed.

Because it is not always possible to know in advance what questions will need to be asked of your data, Riak offers added functionality on top of the key/value model. Tools such as Riak Search (a distributed, full-text search engine), Secondary Indexing (ability to tag objects with queryable metadata), and MapReduce (leveraged for aggregation tasks) are available to perform ad hoc queries as needed.

For many users, the tradeoffs of moving to Riak are worthwhile due to the overall benefits; however, it can be a bit of an adjustment. To see why others have chosen to switch to Riak from both relational systems and other NoSQL databases, check out our Users Page.


Relational to Riak – High Availability

November 13, 2013

This series of blog posts will discuss how Riak differs from traditional relational databases. For more information about any of the points discussed, download our technical overview, “From Relational to Riak.”

One of the biggest differences between Riak and relational systems is our focus on availability. Riak is designed to be deployed to, and runs best on, multiple servers. It can continue to function normally in the presence of hardware and network failures. Relational databases, conversely, are simplest to set up on a single server.

Most relational databases offer a master/slave architecture for availability, in which only the master server is available for data updates. If the master fails, the slave is (hopefully) able to step in and take over.

However, even with this simple model, coping with failure (or even properly defining it) is non-trivial. What happens if the master and slave server cannot talk to each other? How do you recover from a split brain scenario, where both servers think they’re the master and accept updates? What happens if the slave is slow to respond to updates sent from the master database? Can clients read from a slave? If so, does the master need to verify that the slave has received all updates before it commits them locally and responds to the client that requested the updates?

Conversely, Riak is explicitly designed to expect server and network failure. Riak is a masterless system, meaning any server can respond to read or write requests. If one fails, others will continue to service client requests. Once this server becomes available again, the cluster will feed it any updates that it missed through a process we call hinted handoff.

Because Riak’s system allows for reads and writes when multiple servers are offline or otherwise unreachable, data may not always be consistent across the environment (usually only for a few milliseconds). However, through self-healing mechanisms like read repair and Active Anti-Entropy, all updates will propagate to all servers making data eventually consistent.

For many use cases, high availability is more important than strict consistency. Data unavailability can negatively impact revenue, damage user trust, lead to poor user experience, and cause lost critical data. Industries like gaming, mobile, retail, and advertising require always-on availability. Visit our Users Page to see how companies in various industries use Riak.


Counters in Riak 1.4

July 29, 2013

For those of you who are up on your RICON history, you’ll remember that last year, Basho Hackers Russell Brown and Sean Cribbs gave a talk called “Data Structures in Riak” (video can be viewed here). Russell and Sean detailed the approach that Basho was taking to add eventually consistent data structures to Riak. The highlight of the presentation was a demonstration of incrementing and decrementing a counter using a sample app built with riak_dt. A simple counter was incremented. During this, nodes were killed, network partitions were introduced, and despite all that, counts converged once the cluster healed.

It was one of the more memorable moments of the entire conference.

We believe developers can build robust applications utilizing a simple key/value API. GETs, PUTs, and DELETEs can work wonders when utilized correctly. But this doesn’t let you build everything on Riak, and we’ve seen a fair amount of applications that outsource things – like data type operations – to other storage or caching systems. Especially when porting apps from Redis to Riak, we often hear that counters are one feature that Riak lacks. Basho is firmly in the “right-technology-for-the-right-job” camp but we’re aggressively adding functionality that doesn’t break Riak’s design goals of availability and fault-tolerance.

As of the Riak 1.4 release, counters are officially supported. Specifically, a data type known as a PN-Counter, which can be both incremented and decremented. This is the first of a suite of data types we’re planning to add (more on this in a moment) that give developers the ability to build more complex functionality on top of data stored as keys and values.

Use Cases

Using counters, you can increment and decrement a count associated with a named object in a given bucket. This sounds easy, but in a system like Riak where writes aren’t serialized and all updates are asynchronous, determining the last actor in a series of updates to an object is non-trivial. Riak’s counters should be used (in their current state) to count things that can tolerate eventual consistency. With that in mind, here are a few apps and types of functionality that could be implemented with Riak’s Counters:

  • Facebook Like Button
  • Youtube Views and Likes
  • Hacker News Votes
  • Twitter Followers and Favorites

The thing to remember here is that these counts can tolerate slight, brief imprecision. When your follower count fluctuates between 1000 and 1010 before finally settling on 1009, Twitter continues to work as expected. Riak 2.0 will feature work that enables you to enforce consistency around designated buckets which will solve this problem (with the necessary tradeoffs). Until then, use counters in Riak for things that can tolerate eventual consistency.

Even with this caveat, counters are a huge addition to Riak and we’re excited to see the new suite of applications and functionality they make possible.

Usage & Getting Started

To make use of counters we’ve introduced new endpoints and request types for the HTTP and Protocol Buffers APIs, respectively.


The complete documentation for the HTTP interface is here. Here are the basics using CURL:

That’s it.


Usage documentation for this is still in the works, but here’s the relevant message (as seen in riak_pb):

We’re working on implementing these in all of Basho supported client libraries. Keep an eye on these for details and timelines around availability. We currently support counters in the following libraries across the following protocols:

  • Python – HTTP and PB
  • Java – HTTP and PB
  • Erlang – PB

In addition to the docs and code, Basho Hacker Sam Elliot has started a Riak CRDT cookbook. The first section walks you through using counters in a few different ways, and even shows you how to simulate failure events. Take it for a spin and send Sam some feedback.

Future Data Types

In addition to counters, we have big plans for more data types in Riak. Sets and maps are on the short list, and the goal is to have these ready for Riak 2.0. Russell posted an extensive RFC on the Riak GitHub repo for those interested. Comments, critiques, and contributions are all encouraged.

Related Work and Additional Reading

Enjoy and see you at RICON West in October.

Mark and The Basho Team

Riak and Riak Enterprise 1.4 Release

July 10, 2013

Today, Basho Technologies announced the public availability of Riak 1.4.

The release includes new features and updates in addition to a substantive set of addressed issues. These updates include improvements to Secondary Indexes, simplified cluster management through Riak Control, reduced object storage overhead, and progress reporting for Hinted Handoff. Riak 1.4 also sets the stage for Basho’s upcoming major release, Riak 2.0, planned for Fall 2013.

In addition to these features and capabilities, Riak 1.4 includes eventually consistent, distributed counter functionality. Riak’s first distributed data type provides conflict resolution after a network partition and continues to advance Basho’s position of leadership within the distributed systems space.

This release encompasses both Riak and Riak Enterprise, which includes the multi-datacenter replication capability used by an increasing number of enterprise customers to address their critical data needs.

A full list of the new features and updates available in the 1.4 release can be found on the Basho blog post, Basho Announces Availability of Riak 1.4.

Relational to Riak, Part 1- High Availability

January 10, 2013

This is the first in a series of blog posts that discusses a high-level overview of the benefits and tradeoffs of Riak versus traditional relational databases. If this is relevant to your projects or applications, register for our “From Relational to Riak” webcast on January 24.

One of the biggest differences between Riak and relational systems is the focus on availability and how the underlying architecture deals with failure modes.

Most relational databases leverage a master/slave architecture to replicate data. This approach usually means the master coordinates all write operations, working with the slave nodes to update data. If the master node fails, the database will reject write operations until the failure is resolved – often involving failover or leader election – to maintain correctness. This can result in a window of write unavailability.

Master Slave Systems

Conversely, Riak uses a masterless system with no single point of failure, meaning any node can serve read or write requests. If a node experiences an outage, other nodes can continue to accept read and write requests. Additionally, if a node fails or becomes unavailable to the rest of the cluster due to a network partition, a neighboring node will take over responsibilities for the unavailable node. Once this node becomes available again, the neighboring node will pass over any updates through a process called “hinted handoff.” This is another way that Riak maintains availability and resilience even despite serious failure.

Riak Masterless

Because Riak’s system allows for reads and writes, even when multiple nodes are unavailable, and uses an eventually consistent design to maintain availability, in rare cases different replicas may contain different versions of an object. This can occur if multiple clients update the same piece of data at the exact same time or if nodes are down or laggy. These conflicts happen a statistically small portion of the time, but are important to know about. Riak has a number of mechanisms for detecting and resolving these conflicts when they occur. For more on how Riak achieves availability and the tradeoffs involved, see our documentation on the subject.

For many use cases today, high availability and fault tolerance are critical to the user experience and the company’s revenue. Unavailability has a negative impact on your revenue, damages user trust and leads to a poor user experience. For use cases such as online retail, shopping carts, advertising, social and mobile platforms or anything with critical data needs, high availability is key and Riak may be the right choice.

Sign up for the webcast here or read our whitepaper on moving from relational to Riak.


Rolling with Eventual Consistency

September 18, 2012

In a previous post I wrote about the different mindset that a software engineer should have when building for a key-value database as opposed to a relational database. When working with a relational database, you describe the model first and then query the data later. With a key-value database, you focus first on what you want the result of the query to look like, and then work backward toward a model.

As an example, I described averaging some value:

“In a SQL database, we can just call average() on that column, and it will compute the answer at query time. In a key-value store, we can add logic in the application layer to catch the object before it enters Riak, fetch the average value and number of included elements from Riak, compute the new rolling average, and save that answer back in Riak.”

Some questions came in about what that would specifically look like in code. The rest of this post will explore a solution that takes into account the distributed, eventually consistent nature of Riak.

Average the Scenario

Say we have a DataPoint object. An object of this type can be created, but for simplicity let us say that it is never changed or deleted. Every DataPoint has a value property which is an integer. In our application, we sometimes want the average of all of the value properties for all of the DataPoint objects. Oh, and let’s say we have trillions of DataPoint objects.

Notice that according to my previous definition, this usage scenario is ad-hoc because we don’t know exactly when the application needs the average, but it is not dynamic because we know that we will be querying for the average of the value property, and not the average of some property defined at query time.

Using the client library ripple, we might define the object as follows:

class DataPoint
include Ripple::Document
property :value, Integer, :presence => true

The ripple library stores objects as JSON behind the scenes. The following request sent to Riak:


might return something like:


Pretty simple so far. Now how should we calculate the average? The naïve solution would be to read all of the DataPoint objects out, read their value property, and average them. This might work for small sets of objects, but remember we have trillions of them. Fetching all of that data for one simple query is not a realistic solution.

We know that we want the average at the application level. Riak is really good at serving single objects. So instead of querying for the average, we should simply fetch the average as its own data object in Riak. In ripple this might look like:

class Statistic
include Ripple::Document
property :average, Float, :presence => true

Now we just have to hook the two together. We can add a hook to the DataPoint object so that every time it is saved, it updates the Statistic object.

class DataPoint
include Ripple::Document
property :value, Integer, :presence => true
after_save :update_statistics

  def update_statistics
    id = 'data_point_document'
    statistic = StatisticDocument.find(id) || StatisticDocument.new
    statistic.key = id
    statistic.update_with self.value

We need to define some new properties and the update_with method for the Statistic object.

class Statistic
include Ripple::Document
property :average, Float, :presence => true
property :count, Integer, :presence => true

  def update_with(value)
    self.average = (self.average * self.count + value) / (self.count + 1)
    self.count = self.count + 1

By storing the additional information of the number of objects in the count property, we have enough information to reconstruct a rolling average.

Fast Answers

Now whenever we want the average value, we simply fetch the one Riak object:


Our answer comes back really fast, because it has been pre-computed.

But Wait!

The example above works great for one single-threaded application. To preserve the distributed and fault-tolerant features that Riak provides for us, we need to do a little more work.

Consider the following scenario. One copy of the application, let’s call it ClientA, is talking to the Riak cluster. Another, ClientB, is also talking to the cluster. ClientA and ClientB both want to add a DataPoint object. After saving their respective DataPoint objects, they both fetch the Statistic object, which currently has average property set to 10.0, and compute a new average. ClientA thinks the new average property is 12.0. But ClientB added a different value, so it thinks the new average property is 9.0. Both save to the Riak cluster, and we have a classic race condition. Which client wins? It doesn’t matter, because both answers are wrong. Both fail to take into account the DataPoint object handled by the other client.

To get the correct answer, let’s separate the clients. Both clients can save to the same object, but we will partition the data within the object. Formerly the json object looked like:


We want it to look more like:


The data model is more complicated, but we now have enough information to compute the correct answer. We will change the model so that ClientA will only change the “ClientA” portion of the data object, and ClientB will only change the “ClientB” portion of the object. The application will know when we ask for the answer to compute the average across all clients.

The Statistic object now looks something like this:

class Statistic
include Ripple::Document
property :client_data, Hash, :presence => true

  def update_with(value)
    self.client_data ||= {}
    statistic = self.client_data[Client.id] || {'average' => 0.0, 'count' => 0}
    statistic['average'] = (statistic['average'] * statistic['count'] + value) / (statistic['count'] + 1)
    statistic['count']   = (statistic['count'] + 1)
    self.client_data[Client.id] = statistic

  def count
    self.client_data.map{|h| h[1]['count']}.inject(0, &:+)

  def average
    self.client_data.map{|h| h[1]['count'] * h[1]['average']}.inject(0, &:+).to_f / self.count

The method Client.id in the above can be set in a couple of different ways. In this case, we set it using an environment variable passed in at runtime. We rely on a single thread per application to ensure that each client has a single, consistent client identifier, and that it always writes to its own portion of the data object.

We also make sure that we always fetch the object before we write to it, so that this client is always working with the latest copy of the object and not a stale one that was updated somewhere else in the application.

But Wait [Again]!

We solved part of the problem by keeping each client out of the other’s business, but we still have a race condition when the clients try to save the same Statistic object to the cluster. If Riak is operating on a last-write-wins principle, then whichever client has the later timestamp is going to overwrite the other. That gives us the wrong answer, and that won’t do.

We can rely on Riak’s vector clocks to solve this problem. First, we set the allow_mult property on the bucket for Statistic objects so that Riak will store both copies of an object when two come in during a race condition.
Statistic.bucket.allow_mult = true

For reasons outside of the scope of this post, we also want to make sure that we have a PR and a PW quorum greater than N. This helps us ensure that our answer isn’t misread from a fallback node when we reload the Statistic object. N is 3 by default, so the following does the trick by setting PR and PW both to 2.
Statistic.bucket.props = Statistic.bucket.props.merge(:pr => 2, :pw => 2)

When ripple reads an object, it gets a vector clock. When it saves the object, it sends back the same vector clock. Every time Riak changes an object, it changes the vector clock in such a way that it preserves the clock’s lineage. So if Riak gets back a different vector clock with an object than the one it currently has, it knows it has a collision and saves both values as siblings.

Our race condition plays out. We now have two sibling objects in Riak stored under the same key: one is up-to-date for ClientA, and the other is up-to-date for ClientB. The truth is somewhere in between.

The next time ripple reads the Statistic object, it gets back both objects with a new vector clock for the set. We now have to resolve the conflict to find the true answer. We know that ClientA‘s true answer is going to have the highest count property for “ClientA”, and we know that ClientB‘s true answer is going to have the highest count property for “ClientB”. Since we know that only one copy of a given client was writing at a time [single-threaded], we know that only the client data with the highest count is correct.

We can find the true answer by merging the siblings and comparing the data one client at a time. For each client, we always take the value with the highest count property and throw away the smaller counts, which represent older values.

When ripple saves back the object, it sends the new vector clock as well so that Riak knows to replace the old siblings with this new resolved object. We handle this with the on_conflict method in ripple.

class StatisticDocument
include Ripple::Document
property :client_data, Hash, :presence => true

  def update_with(value)
    self.client_data ||= {}
    statistic = self.client_data[Client.id] || {'average' => 0.0, 'count' => 0}
    statistic['average'] = (statistic['average'] * statistic['count'] + value).to_f / (statistic['count'] + 1)
    statistic['count']   = (statistic['count'] + 1)
    self.client_data[Client.id] = statistic

  def count
    self.client_data.map{|h| h[1]['count']}.inject(0, &:+)

  def average
    self.client_data.map{|h| h[1]['count'] * h[1]['average']}.inject(0, &:+).to_f / self.count

  on_conflict do |siblings, c|
    resolved = {}
    siblings.reject!{|s| s.client_data == nil}
    siblings.each do |sibling|
      resolved.merge! sibling.client_data do |client_id, resolved_value, sibling_value|
        resolved_value['count'] > sibling_value['count'] ? resolved_value : sibling_value
    self.client_data = resolved

Voila! We now have a rolling average that gracefully handles race conditions caused by the distributed nature of a Riak cluster.

A working example of this solution is available in my riak-rolling-average repo. As usual, the good stuff is in the tests. You can run the tests with bundle exec rake CLIENT=someclient where someclient is unique to that Ruby thread.

Note that test_example.rb runs five clients concurrently by shelling out to five rake tasks. Each rake task loads in a pre-defined chunk of data from the data file. This causes plenty of race conditions, which is what we want to simulate. We still get the correct answer.

The Path Not Traveled

If you aren’t familiar with my previous advice on approaching key-value architecture, you might be tempted to solve a use case like the average described above using Riak’s map/reduce functionality. There are several reasons why the solution above would be preferred, but let us play the devil’s advocate and explore the map/reduce solution.

You can view the entire code in the repo below, but if we solve this using javascript map/reduce functions, then we can extract the data from each DatePoint object in the following map phase:

match = riakObject.values[0].data.match(/"value":([\d]+)/);
return [[parseInt(match[1]), 1]];
} else {
return [null];

Then compute the average by combining the results together in the following reduce phase:

var sum = 0.0;
var count = 0;
for(i=0; i<values.length; i++){
value = values[i];
sum = sum + (value[0] * value[1]);
count = count + value[1];
if(count > 0) {
return [[(sum / count), count]];
} else {
return [];

If you look in the mapreduce branch of the riak-rolling-average repo, you will find the code for this example. On my local laptop, which I admit is not optimized for this type of operation, it takes about 3 seconds to fetch the answer with map/reduce searching over 5000 DataPoint objects. [A compiled Erlang map/reduce function would perform much better.] It only took a few milliseconds to fetch the answer using the pre-computing code.

Even if I did have a more powerful cluster on which to run this map/reduce, recall that we have trillions of DataPoint objects. Each object must be fetched from the key-value store and pulled into the javascript virtual machine for processing. This is essentially equivalent to the naïve approach described earlier, but performed on the server instead of the application client. If multiple users initiate the same map/reduce, contention for resources would quickly overwhelm the cluster’s hardware. In practice, we generally reserve Riak’s map/reduce for data migrations and offline analysis although there are exceptions to that guideline.

Conceptually, the map/reduce solution might be simpler to architect and it might seem more intuitive to people from an RDBMS background. Admittedly, the recommended solution above uses more resources during writes, opens the possibility of number of clients and vector clocks expanding the object size and complexity, and generally requires a more complex model; however, it also provides a much more performant answer and is more suitable to the eventual consistency and distributed nature of Riak.

Other Issues

Note that I did not address bootstrapping the average when DataPoint objects already exist, nor handling deletes or updates to existing DataPoint objects.

Happy Coding

Some of the ways we need to think about architecture problems when we write applications for Riak might not be intuitive at first, because the same problems are already solved by convention in the RDBMS world. Some of this shift in mindset is necessitated by the key-value nature of Riak. As the rolling average example here demonstrates, some concerns that we need to address in the application architecture are brought about by the distributed, eventually consistent nature of Riak as well. This shift in mindset is a tradeoff that you can choose to make for the sake of high availability, fault tolerance, horizontal scaling, and other nice features that Riak provides. If you enjoy learning new things to take advantage of new capabilities, then I’d wager you will enjoy developing applications with Riak.



  • A working example of this code is available in my riak-rolling-average repo. Check out the mapreduce branch for the production-unfriendly version using Riak’s map/reduce.
  • Vector Clocks are Easy
  • Vector Clocks are Hard
  • Meangirls provides serializable data types for eventually consistent systems, similar to but more comprehensive than my example.
  • Hanover also provides eventually consistent data types.
  • The Statistic object is an eventually consistent data type, an example of a CRDT: Convergent or Commutative Replicated Data Type. Read the comprehensive research paper on CRDT.
  • Bryce Kerley speaking about CRDTs at SRCFringe Edinburgh 2012
  • I put some hours into a gem ripple-statistics which adds this and other simple calculations to the ripple library. Note the limitations of the current code there.

The Second Edition of Riak Handbook Available Now for Download

Basho Technologies today announced the immediate availability of the second edition of Riak Handbook.

CAMBRIDGE, MA – June 1, 2012Basho Technologies today announced the immediate availability of the second edition of Riak Handbook. The significantly updated Riak Handbook includes more than 43 pages of new content covering many of the latest feature enhancements to Riak, Basho’s industry-leading, open-source, distributed database. Riak Handbook is authored by former Basho developer and advocate, Mathias Meyer.

Riak Handbook is a comprehensive, hands-on guide to Riak. The initial release of Riak Handbook focused on the driving forces behind Riak, including Amazon Dynamo, eventual consistency and CAP Theorem. Through a collection of examples and code, Mathias’ Riak Handbook explores the mechanics of Riak, such as storing and retrieving data, indexing, searching and querying data, and sheds a light on Riak in production. The updated handbook expands on previously covered key concepts and introduces new capabilities, including the following:

  • An overview of Riak Control, a new Web-based operations management tool
  • Full coverage on pre- and post-commit hooks, including JavaScript and Erlang examples
  • An entirely new section on deploying Erlang code in a Riak cluster
  • Additional details on secondary indexes
  • Insight into load balancing Riak nodes
  • An introduction to network node planning
  • An introduction to Riak CS, includes Amazon S3 API compatibility

The updated Riak Handbook includes an entirely new section dedicated to popular use cases and is full of examples and code from real-time usage scenarios.

Mathias Meyer is an experienced software developer, consultant and coach from Berlin, Germany. He has worked with database technology leaders such as Sybase and Oracle. He entered into the world of NoSQL in 2008 and joined Basho Technologies in 2010.

About Basho Technologies
Basho Technologies is the leader in highly-available, distributed database technologies used to power scalable, data-intensive Web, mobile, and e-commerce applications and large cloud computing platforms. Basho customers, including fast-growing Web businesses and large Fortune 500 enterprises, use Riak to implement content delivery platforms and global session stores, to aggregate large amounts of data for logging, search, and analytics, to manage, store and stream unstructured data, and to build scalable cloud computing platforms.

Riak is available open source for download at http://wiki.basho.com/Riak.html. Riak EnterpriseDS is available with advanced replication, services and 24/7 support. Riak CS enables mutli-tenant object storage with advanced reporting and an Amazon S3 compatible API. For more information visit www.basho.com or follow us on Twitter at www.twitter.com/basho.

Why Your Riak Cluster Should Have At Least Five Nodes

April 26, 2012

Here at Basho we want to make sure that your Riak implementations are set up from the beginning to succeed. While you can use the Riak Fast Track to quickly set up a 3-node dev/test environment, we recommend that all production deployments use a minimum of 5 nodes, ensuring you benefit from the architectural principles that underpin Riak’s availability, fault-tolerance and scaling properties.

TL;DR: Deployments of five nodes or greater will provide a foundation for the best performance and growth as the cluster expands. Since Riak scales linearly with the addition of more nodes, users find improved performance, reliability, and throughput with larger clusters. Smaller deployments can compromise the fault-tolerance of the system: with a “sane” replication requirement for availability (we default to three copies), node failures in smaller clusters mean that replication requirements may not be met. This can result in degraded performance and risk of data loss. Additionally, clusters smaller than five nodes mean that with a sane replication requirement of 3, a high percentage (75-100% of the nodes) will need to respond to each request, putting undue load on the cluster that may degrade performance.

Let’s take a closer look in the scenario of a three- and four-node cluster.

Performance and Fault Tolerance Concerns in a 3-Node Cluster

To ensure that the cluster is always available to respond to read and write requests, Basho recommends a “sane default” for data replication: three copies of the data on three different nodes. The default configuration of Riak requires four nodes at minimum to insure no single node holds more than one copy of any particular piece of data. (In future versions of Riak we’ll be able to guarantee that each replica is living on a separate physical node. At this point it’s almost at 100%, but we won’t tell you it’s guaranteed until it is.) While it is possible to change the settings to ensure that the three replicas are on distinct nodes in a three node cluster, you still run into issues of replica placement during a node failure or network partition.

In the event of node failure or a network partition in a three-node cluster, the default requirement for replication remains three but there are only two nodes available to service requests. This will result in degraded performance and carries a risk of data loss.

Performance and Fault Tolerance Concerns in a 4-Node Cluster

With a requirement of three replicas, any one request for a particular piece of data from a 4-node cluster will require a response from 75 – 100% of the nodes in the cluster, which may result in degraded performance. In the event of node failure or a network partition in a 4-node cluster, you are back to the issues we outline above.

What if I want to change the replication default?

If using a different data replication number is right for your implementation, just make sure to use a cluster of N +2 nodes where N is the number of replicas for the reasons outlined above.

Going With 5 Nodes

As you add nodes to a Riak cluster that starts with 5 nodes, the percentage of the cluster required to service each request goes down. Riak scales linearly and predictably from this point on. When a node is taken out of service or fails, the number of nodes remaining is large enough to protect you from data loss.

So do your development and testing with smaller clusters, but when it comes to production, start with five nodes.

Happy scaling.


How Eventual is Eventual Consistency?

March 2, 2012

The second BashoChats Meetup was held last week at BashoWest. The office was packed with area developers and our two speakers, Ted Nyman and Peter Bailis, each delivered exceptional talks. Our awesome videographer Matt Fisher finished Peter’s talk first and it’s so good that we didn’t see any reason to keep it from you while he put the final touches on Ted’s.

Peter is Graduate Student in the much-heralded Berkeley CS department. Suffice it to say that we were honored to have him at BashoChats. He and some colleagues have been working on something called Probabilistically Bounded Staleness for Practical Partial Quorums (PBS). In short, PBS aims to define just how eventual “eventual consistency” is, and their research produced some fascinating findings that should affect how people view and deploy distributed databases like Riak, Cassandra and Voldemort.

This talk, the subject matter, and the presenter are all fantastic. Watch it twice and tell three friends about it. (The PDF version of the slides are here for any interested parties.)

We’ll have Ted Nyman’s talk up next week. In the meantime, join BashoChats so you can be a part of the next event.


New Riak Handbook Available Now for Download

Former Basho Developer Advocate Mathias Meyer authors a comprehensive, hands-on guide to Riak.

CAMBRIDGE, MA – January 17, 2012 – Basho Technologies, the leader in highly-available, distributed data store technologies, today announced that former Basho developer advocate Mathias Meyer has completed Riak Handbook, a comprehensive, hands-on guide to Riak, Basho’s industry-leading, open source, distributed database.

Riak Handbook begins by exploring the driving forces behind Riak, including Amazon Dynamo, eventual consistency and CAP Theorem. Through a collection of examples and code, Mathias Riak Handbook walks through Riaks many features in detail including the following capabilities:

  • How to store-and-retrieve data in Riak
  • Analyze data with MapReduce using JavaScript and Erlang
  • Build and search full-text indexes with Riak Search
  • Index and query data using secondary indexes
  • Model data for eventual consistency
  • Scale to multi-node clusters in less than five minutes
  • Operate Riak in production
  • Handle failures in your application

Mathias Meyer is an experienced software developer, consultant and coach from Berlin, Germany. He has worked with database technology leaders such as Sybase and Oracle. He entered into the world of NoSQL in 2008 and worked at Basho Technologies from 2010 to 2011.

“We are excited that Mathias took on the endeavor to build a comprehensive book all about Riak,” said John Hornbeck, Vice President of Client Services, Basho Technologies. “Our customers and community will benefit from having a single source that covers everything from setting up Riak, to scaling out quickly, to operating and maintaining Riak. We have already seen strong customer interest in Riak Handbook, including many seeking site licenses to outfit their entire teams.”

Riak Handbook is available for purchase at riakhandbook.com. Single editions are available at $29/download. Site licenses are available for organizations implementing Riak for only $249.

About Basho Technologies
Basho Technologies is the leader in highly-available, distributed data store technologies used to power scalable, data-intensive Web, mobile and e-commerce applications. Our flagship product, Riak, frees customer applications from the performance, scalability, and availability constraints of traditional databases while reducing overall storage and support costs by up to 80%. Basho customers, including fast-growing Web businesses and large Fortune 500 enterprises, use Riak to implement global session stores, to aggregate large amounts of data for logging, search, and analytics, and to manage, store and stream unstructured data.

Riak is available open source for download at basho.com/resources/downloads. Riak EnterpriseDS is available with advanced replication, services and 24/7 support. For more information visit basho.com or follow us on Twitter at www.twitter.com/basho.

Basho Technologies is based in Cambridge, MA, and maintains regional offices in San Francisco, CA and Reston, VA.