November 10, 2014
Many data needs are better served by data stores that are optimized for maximum availability and scalability — rather than optimized for consistency. For certain use cases, there are elements to the data that require strong consistency. With Riak 2.0, in addition to eventual consistency, there is now a way to enforce strong consistency when needed.
NOTE: Riak’s strong consistency feature is currently an open-source-only feature and is not yet commercially supported.
Behavioral Changes with Strong Consistency
Strongly consistent operations in Riak function much like eventually consistent operations at the application level. The core difference lies in the types of errors Riak will report to the client.
Each request to update an object (except for the initial creation) must include a context value reflecting the last time the application read it. This is the same behavior that Riak clients have always followed with version vectors and strong consistency also mandates its use. Similarly, reading data from a strongly consistent Riak bucket functions exactly like eventually consistent reads.
If that value is not provided for an update operation to an existing object, Riak will reject it. This is because the database assumes that you have not seen the current value and may not know what you’re doing.
Similarly, if that context value is out of date, Riak will also reject update operations. The client must re-read the latest value and supply an update based on that new value, with the new context.
If Riak cannot contact a majority of the servers responsible for the key, the request will fail. Ordinarily, Riak is happy to accept all operations in the interest of high availability and never dropping a write – even in the extreme case of only one server surviving data center outages.
Strong consistency also eliminates object siblings, as it is effectively impossible for the cluster to disagree on the value of an object.
When considering consistency models in an application, it is easy for the logic to quickly become daunting. This is especially true when designing a workflow that leverages both eventually and strongly consistent models. It is, therefore, easiest to begin with a simple use case.
Consider the workflow involved in storing and updating username and password data. In the case of a password update, it is necessary that — at any given time — there be exactly ONE result for a user’s password. Relatedly, it is important to ensure that an update of this value is fully atomic or user experience is substantially degraded. It would be possible to leverage Riak for all the eventually consistent elements of the application and leverage strong consistency for the username and password.
To see how eventual and strong consistency can be combined to solve business problems, let’s take a not-so-hypothetical example from the energy industry.
Imagine you’re collecting massive amounts of geological data for analysis. Each batch of data must be processed by a single instance of your application. Since this processing can take hours, days, or even weeks to complete, it’s expensive if two applications handle the same batch.
Let’s walk through the sequence of events.
- Batch of data arrives for processing.
- The batch is stored in a large object store (like, Riak CS) under a batch ID.
- The batch ID is added to a pending job list in Riak and stored as a set (one of the new Riak Data Types).
This is a classic example of eventual consistency and an illustration of the value of the new Riak Data Types introduced with Riak 2.0. Storing a new batch ID in your database should never fail, even if servers are offline. If multiple applications are adding batch IDs to the pending list at the same time, it’s perfectly reasonable for those lists to temporarily diverge, as long as they can be trivially merged later.
Let’s continue to see where strong consistency comes into play.
- A compute node becomes available to process the data.
- The compute node retrieves the pending job list and picks a batch ID.
- The compute node attempts to create a lock for that batch ID.
This is where strong consistency is required. This lock object should be created in a bucket that is managed by the new strong consistency subsystem in Riak 2.0. If someone else also grabs that batch ID and tries to create another lock object, Riak’s strong consistency logic will reject this second attempt. This compute node will just start over by grabbing a new ID.
To detect crashed jobs, the lock object should be created with basic job data, such as which compute node owns the processing job and what time it was started.
- The compute node asks Riak to add the batch ID to a different set, a running job list.
- The compute node asks Riak to remove the batch ID from the pending list.
- The job runs.
- When completed, the compute node asks Riak to add the batch ID to a completed job list.
- Riak is asked to remove the batch ID from the running list.
- The compute node deletes the lock object (or updates it to reflect the completion of the processing job).
Tradeoffs When Using Strong Consistency
- Blind updates will be rejected, so the client must read the existing value before supplying a new one (except in the case of entirely new keys).
- Write requests may be slightly slower due to coordination overhead.
- If a majority of the servers responsible for a piece of data are unavailable, write requests will fail. Read operations may fail depending on the freshness of the data that is still accessible.
- Secondary indexes (2i) are not yet supported.
- Multi-datacenter replication in Riak Enterprise is not yet supported.
- Using Strong Consistency (for developers)
- Managing Strong Consistency (for operators)
- Strong Consistency (theory & concepts)
Strong Consistency is now available with Riak 2.0. Download Riak 2.0 on our Docs Page.
January 6, 2014
With the launch of the Technical Preview of Riak 2.0, we also announced the addition of strong consistency to Riak. This addition fundamentally changes how Riak can be used, since all previous versions classified Riak as an eventually consistent system.
With Riak 2.0, developers now have the flexibility to choose whether buckets should be highly available or strongly consistent, based on data requirements. Consistency preferences are defined on a per bucket type basis, in the same cluster.
At RICON West 2013, Basho senior engineer, Joseph Blomstedt, gave an updated version of his “Bringing Consistency to Riak” talk. The original talk (presented at RICON West 2012) discussed the challenges, motivations, and high-level plans of bringing consistency to Riak. This updated version presents the actual implementation that has since been built and how it will function in Riak 2.0. Both talks are available below.
To start testing the strong consistency feature, you can download the Technical Preview of Riak 2.0 here.
To watch all of the sessions from RICON West 2013, visit the Basho Technologies Youtube Channel.
San Francisco, CA – October 29, 2013 – Today at RICON West, Basho, the worldwide leader in distributed systems and cloud storage software, announced that the Technical Preview of Riak 2.0, Basho’s distributed NoSQL database, is now publicly available. This major release introduces new features that improve developer ease-of-use, increase flexibility around consistency, boost search and analytics capabilities, simplify operations at scale, and provide enterprise-class data security.
Riak continues to gain adoption worldwide supporting critical applications that require high-availability, predictable scalability, and performance. Riak’s unique ability to distribute data, both to ensure availability and provide data locality, provides enterprises a proven database technology for powering critical web, mobile and social applications, cloud computing platforms, and to store and serve machine-to-machine and sensor data. Riak is used by thousands of companies, including over 30% of the Fortune 50.
New Features in Riak 2.0
- Riak Data Types. Riak 2.0 includes a range of flexible, distributed data types, that greatly simplify application development without sacrificing Riak’s availability and partition tolerance characteristics. Available Riak data types include distributed counters, sets, maps, registers, and flags.
- Strong Consistency. Developers now have the flexibility to choose whether buckets should be eventually consistent (the default Riak configuration today that provides high availability) or strongly consistent, based on data requirements.
- Full-Text Search Integration with Apache Solr. Riak Search is completely redesigned in Riak 2.0, leveraging the Apache Solr engine. Riak Search in 2.0 fully supports the Solr client query APIs, enabling integration with a wide range of existing software and commercial solutions.
- Security. Riak 2.0 adds the ability to administer access rights and utilize plug-in authentication models. Authentication and Authorization is provided via client APIs.
- Simplified Configuration Management. Riak 2.0 continues to improve Riak’s operational simplicity by changing how, and where, configuration information is stored in an easy-to-parse and transparent format.
- Reduced Replicas for Secondary Sites. Exclusive to Riak Enterprise 2.0, users can now optionally store fewer copies of replicated data across multiple datacenters to better maintain a balance between storage overhead and availability.
Technical Preview Availability
About RICON West
RICON West 2013 is part of the RICON conference series. RICON is Basho’s distributed systems conference for developers and academics. RICON West will take place in San Francisco, CA on October 29-30. More than 25 speakers will discuss applications, use cases, and the future of distributed systems – including NoSQL solutions and cloud storage. RICON West 2013 speakers include Basho, Google, Microsoft Research, Netflix, salesforce.com, Seagate, The Weather Company, and Twitter. RICON West 2013 is sold-out; however, Basho will offer a live stream.
Basho is a distributed systems company dedicated to making software that is highly available, fault-tolerant and easy-to-operate at scale. Basho’s distributed database, Riak, and Basho’s cloud storage software, Riak CS, are used by fast growing Web businesses and by over 25 percent of the Fortune 50 to power their critical Web, mobile and social applications and their public and private cloud platforms.
Riak and Riak CS are available open source. Riak Enterprise and Riak CS Enterprise offer enhanced multi-datacenter replication and 24×7 Basho support. For more information, visit basho.com. Basho is headquartered in Cambridge, Massachusetts and has offices in London, San Francisco, Tokyo and Washington DC.