January 27, 2014
Client libraries are essential to using Riak, and we at Basho have always been proud to have a flourishing client library ecosystem surrounding Riak. The release of Riak 2.0 has brought a variety of fundamental changes that client builders and maintainers should be aware of, including a variety of new features that clients should be equipped to utilize, such as security and Riak Data Types. Here, we’ll provide a list of some of those fundamental changes and suggest some approaches to addressing them, including examples from our official libraries.
Protocol Buffers API
While Riak continues to have a fully featured HTTP API for the sake of backwards compatibility, we do not recommend that you use it to build new client libraries. Instead, we encourage you to design clients to interact with Riak’s Protocol Buffers API, primarily because internal tests at Basho have shown performance gains of 25% or more when using Protocol Buffers.
The drawback behind using Protocol Buffers is that it’s not as widely known as HTTP and has a bit of a learning curve for those who aren’t familiar with it. But the good news is both that the learning curve is worth it and that Google offers official support for C++, Java, and Python support for PBC while many other languages have strong community support.
When you start developing your client library, you’ll need to find a Protocol Buffers message generator in the language of your choice and convert a series of .proto files. Once you’ve generated all the necessary messages, you’ll need to implement a transport layer to interface with Riak. A full list of Riak-specific PBC messages can be found here. The official Python client, for example, has a single RiakPbcTransport class that handles all message building, sending, and receiving, while the official Java client takes a more piecemeal approach to message building (as shown by the FetchOperation class, which handles reads from Riak). Once the transport layer is in place, you can start building higher-level abstractions on top.
Nodes and clusters
Another thing to keep in mind when writing Riak clients is that Riak always functions as a clustered (and hence multi-node) system, and connecting clients need to be set up to interact with all nodes in a cluster on the basis of each node’s host and port.
While it’s certainly possible to build clients that are intended to interact only with a single node, this means that your client’s users will need to create their own cluster interaction logic. Life will be far easier for your client’s users if your client is able to do things like this:
- periodically ping nodes to make sure they’re still online
- recognize when nodes are no longer responding and stop sending requests to those nodes
- provide a load-balancing scheme (or multiple possible schemes) to spread interactions across nodes
In general, you should think of the cluster interaction level as a kind of stateful registry of healthy nodes. In some systems, it might also be necessary to have configurable parameters for connections to Riak, e.g. minimum and/or maximum concurrent connections.
Prior to 2.0, the location of objects in Riak was determined by bucket and key. In version 2.0, bucket types were introduced as a third namespacing layer in addition to buckets and keys. Connecting clients now need to either specify a bucket type or use the default type for all K/V operations. Although creating, listing, modifying, and activating bucket types can be accomplished only via the command line, your client should provide an interface for seeing which bucket properties are associated with a bucket type.
One of the changes to be aware of when building clients is that Riak has changed its querying structure to accommodate bucket types. When performing K/V operations, you now need to specify a bucket type in addition to a bucket and a key. This means that the structure of all K/V operations needs to be modified to allow for this. We’d also recommend enabling users to perform K/V operations without specifying a bucket type, in which case the default type is used. In the official Python client, for example, the following two reads are equivalent:
Dealing with objects and content types
One of the tricky things about dealing with objects in Riak is that objects can be of any data type you choose (Riak Data Types are a different matter, and covered in the section below). You can store JSON, XML, raw binaries, strings, mp3s and MPEGs (though you should probably consider Riak CS for larger files like that), and so on. While this makes Riak an extremely flexible database, it means that clients need to be able to work with a wide variety of content types.
All objects stored in Riak must have a specified content type, e.g. application/json, text/plain, application/octet-stream, etc. While a Riak client doesn’t need to be able to handle all data types, a client intended for wide use should be able to handle at least the following:
- plain text
You should also strongly consider building automatic type handling into your client. When the official Ruby and Python clients, for example, read JSON from Riak, they automatically convert it to hashes and dicts (respectively). The Java client, to give another example, automatically converts POJOs to JSON by default and enables you to automatically convert stored JSON to custom POJO classes when fetching objects, which enables you to easily interact with Riak in a type-specific way. If you’re writing a client in a language with strong type safety, this would be a good thing to offer users.
Another important thing to bear in mind: all of your client interactions with Riak should be UTF-8 compliant, not just for the data stored in objects but also for things like bucket, key, and bucket type names. In other words, with your client it should be possible to store an object in the key Möbelträgerfüße in the bucket tête-à-tête.
If you’re using either Riak Data Types or Riak’s strong consistency subsystem, you don’t have to worry about siblings because those features by definition do not involve sibling creation or resolution. But many users of your client will want to use Riak as an eventually consistent system, which means that they will need to create their own conflict resolution logic.
In essence, your users’ applications need to make intelligent, use-case-specific decisions about what to do when the application is confronted with siblings. Most fundamentally, this means that your client needs to enable objects to have multiple sibling values. In the official Python client, for example, each object of class RiakObject has parameters that you’d expect, like content_type, bucket, and data, but it also has a siblings parameter that returns a list of sibling values.
In addition to enabling objects to have multiple values, we also strongly recommend providing some kind of helper logic that enables users to easily apply their own sibling resolution logic. What type of interface should be provided? That will depend heavily on the language. In a functional language, for example, that might mean enabling users to specify filtering functions that whittle the siblings down to a single “correct” value. To see conflict resolution in our official clients in action, see our tutorials for Java, Ruby, and Python.
Riak Data Types
In version 2.0, Riak added support for conflict-free replicated data types (aka CRDTs), which we call Riak Data Types. These five special Data Types—flags, registers, counters, sets, and maps—enable you to forgo things like application-side conflict resolution because Riak handles the resolution logic for you (provided that your data can be modeled as one of the five types). What separates Riak Data Types from other Riak objects is that you interact with them transactionally, meaning that changing Data Types involves sending messages to Riak about what changes should be made rather than fetching the whole object and modifying it on the client side.
This means that your client interface needs to enable users to modify the Data Types as much as they need to on the client side before committing those changes all at once to Riak. So if an application needs to add five counters to a map and remove items from three different sets within that map, it should be able to commit those changes with one message to Riak. The official Python client, for example, has a store() function that sends all client-side changes to Riak at once, plus a reload() function that fetches the current value of the type from Riak (with no regard to client-side changes).
One of the most important features introduced in Riak 2.0 is security. When enabled, all clients connecting to Riak, regardless of which security source is chosen, must communicate with Riak over a secure SSL connection rooted in an x.509-certificate-based Public Key Infrastructure (PKI). If you want your client’s users to be able to take advantage of Riak security, you’ll need to create an SSL interface. Fortunately, there are OpenSSL (and other) libraries in all major languages. To see SSL in action in our official clients, see our tutorials for Java, Ruby, Python, and Erlang.
Features That Don’t Require Client Changes
The following features that became available in Riak 2.0 shouldn’t require any changes to client libraries:
- Strong consistency — While adding strong consistency has entailed a lot of changes within Riak itself, K/V operations involving strongly consistent data function just like their eventually consistent counterparts in most respects. The one small exception is that performing object updates without first fetching the object will necessarily fail because the initial fetched obtains the object’s causal context, which is necessary for strongly consistent operations. It may be a good idea to add this requirement to your client documentation.
- New configuration system — Configuration has been drastically simplified in Riak 2.0, but these changes won’t have a direct impact on client interfaces.
- Dotted version vectors — While dotted version vectors (DVVs) are superior to the older vector clocks in preventing problems like sibling explosion, client libraries interact with DVVs just like they interact with vector clocks. In fact, our Protocol Buffers messages still use a vclock field for both vector clocks and DVVs, for the sake of backward compatibility.
How to Get Help
Building a 2.0-compliant Riak client has some non-trivial aspects but can be an exciting and rewarding project. Fortunately there are a variety of venues where you can get help, both from Basho engineers and from others in the Riak community.
For inspiration and education, the official Basho Riak clients in the GitHub repos are a good place to start. If you run into trouble, though, we highly recommend the Riak mailing list. There could very well be other client builders and maintainers working through a similar problem.
January 22, 2015
In speaking with Riak users, both open source and commercial, we are frequently told that Riak’s key/value model is more flexible and faster to develop against than a traditional relational database. Even though Riak is well suited for many applications, there are inevitable tradeoffs in terms of query options and data types that are available. With a key/value model, there is no concept of columns or rows, therefore Riak does not have join operations. Riak can be queried either directly via HTTP, the protocol buffers API and through various client libraries. However, there is no SQL or SQL-like language that is currently available.
Riak’s key/value data model does not preclude queryability. There are several powerful querying options including:
- Riak Search: Integration with Apache Solr provides full-text search and support for Solr’s client query APIs.
- Secondary Indexes: Secondary Indexes (2i) give developers the ability to tag an object stored in Riak with one or more query values. Indexes can be either integers or strings, and can be queried by either exact matches or ranges of values.
- MapReduce: Developers can leverage Riak MapReduce for tasks like filtering documents by tag, counting words in documents, and extracting links to related data.
For more information, check out the Riak documentation on Querying Data.
The table below illustrates key/value mappings for common application types. Remember that values in Riak are opaque and stored on disk as binaries – JSON or XML documents, images, text, etc. The way data is organized in Riak should take into account the unique needs of the application, including access patterns such as read/write distribution, latency differences between various operations, use of Riak features (including MapReduce, Search, Secondary Indexes), and more.
|Session||User/Session ID||Session Data|
|Advertising||Campaign ID||Ad Content|
|Sensor||Date, Date/Time||Sensor Updates|
|User Data||Login, eMail, UUID||User Attributes|
|Content||Title, Integer||Text, JSON/XML/HTML Document, Images, etc.|
Consider, for example, one of the canonical use cases for Riak…storing user and session data. In a relational database, the “users” table is well known and, basically, provides a unique identifier per user, and then a series of identifying information about that user as individual columns such as:
- First name
- Last name
- Counter of Site Visits
- Paid Account Identifier
This data can then be used to correlate or count, paid users, common interests, etc. via a series of SQL queries against the row/column structure of the users table.
Riak, in contrast, provides flexibility in how this data can be modeled based upon the application use case. It may be desirable to create a Users bucket, with the UserName (or Unique Identifier) as the key and a JSON object storing all user attributes as the value. Or, as we describe in Data Modeling with Riak Data Types, leverage the power of Riak Data Types by creating a map type for each user storing:
- first and last name strings in the register type,
- interests as a set,
- a counter for visits,
- and a flag for paid account identifier.
One of the best ways to enable application interaction with objects (a key/value pair) in Riak is to provide structured bucket and key names for the objects. This approach often involves wrapping information about the object in the object’s location data itself.
For example, appending a timestamp, UUID, or Geographical coordinate, to a key’s name allows for fine grained queryability via simple lookup to locate and retrieve a specific set of information. Leveraging the same naming mechanism as created for users (UniqueID as the key) enables, in a separate sessions bucket, storing the UUID append with a timestamp as the key and the session data (in binary format) as the object. In this way, using the same UUID, I am able to obtain both user and session data stored in different buckets and in different formats.
For additional information, and more complex considerations such as modeling relationship and advanced social applications, see the Riak documentation on use cases and data modeling.
Resolving Data Conflicts
In any system that replicates data, conflicts can arise – e.g., if two clients update the same object at the exact same time or if not all updates have yet reached hardware that is experiencing lag. Riak is “eventually consistent” – while data is always available, not all replicas may have the most recent update at the same time, causing brief periods (generally on the order of milliseconds) of inconsistency while all state changes are synchronized.
However, Riak does provide features to detect and help resolve the statistically small number of incidents when data conflicts occur. When a read request is performed, Riak looks up all replicas for that object. By default, Riak will return the most updated version, determined by looking at the object’s vector clock. Vector clocks are metadata attached to each replica when it is created. They are extended each time a replica is updated to keep track of versions. Clients can also be allowed to resolve conflicts themselves.
Further, when an outdated object is discovered as part of a read request, Riak will automatically update the out-of-sync replica to make it consistent. Read Repair, a self-healing property of the database, will even update a replica that returns a “not_found” in the event that a node loses it due to physical failure.
Riak also features “Active Anti-Entropy,” which is an automatic self-healing property that runs in the background. Rather than waiting for a read request to trigger a replica repair (as with Read Repair), Active Anti-Entropy constantly uses a hash tree exchange to compare replicas of objects and automatically repairs or updates any that are divergent, missing, or corrupt. This can be beneficial for large clusters storing “stale” data.
More information on vector clocks, dotted version vectors, and conflict resolution can be found in the online documentation in the section regarding Causal Context.
Multi-site replication is quickly becoming critical for many of today’s platforms and applications. Not only does replication across multiple clusters provide geographic data locality – the ability to serve global traffic at low-latencies – it can also be an integral part of a disaster recovery or backup strategy. Other teams may use multi-site replication to maintain secondary data stores, both for failover as well as for performing intensive computation without disrupting production load. Multi-site replication is included in Basho’s commercial extension to Riak, Riak Enterprise, which also includes 24/7 support.
Multi-site replication in Riak works differently than the typical approach seen in the relational world, multi-master replication. In Riak’s multi-datacenter replication, one cluster acts as a “primary cluster.” The primary cluster handles replication request from one or more “secondary clusters” (generally located in datacenters in other regions or countries). If the datacenter with the primary cluster goes down, a secondary cluster can take over as the primary cluster. In this sense, Riak’s multi-datacenter capabilities are “masterless.”
In multi-datacenter replication, there are two primary modes of operation: full sync and real-time. In full sync mode, a complete synchronization occurs between primary and secondary cluster(s). In real-time mode, continual, incremental synchronization occurs – replication is triggered by new updates. Full sync is performed upon initial connection of a secondary cluster, and then periodically (by default, every 6 hours). Full sync is also triggered if the TCP connection between primary and secondary clusters is severed and then recovered.
Data transfer is unidirectional (primary->secondary). However, bidirectional synchronization can be achieved by configuring a pair of connections between clusters.
Full documentation for multi-datacenter replication in Riak Enterprise is available in the online documentation.
Modeling data in any non-relational solution requires a different way of thinking about the data itself. Rather than an assumption that all data cleanly fits into a structure of rows and columns, the data domain can be overlayed on the core Key/Value store (Riak) in a variety of ways. There are, however, distinct tradeoffs and benefits to understand.
Relational Databases have:
- Foreign keys and constraints
- Sophisticated query planners
- Declarative query language (SQL)
- A Key/Value model where the value is any unstructured data
- More data redundancy that provides better availability
- Eventual consistency
- Simplified query capabilities
- Riak Search
What you will gain:
- More flexible, fluid designs
- More natural data representations
- Scaling without pain
- Reduced operational complexity
For more information on Data Modeling, or to chat with a member of the Basho team on the topic, please request a Tech Talk.
January 6, 2015
If you have read about Riak, or seen a member of the Basho team present, you have probably heard the phrase “Your data is opaque to Riak.” While this is not, strictly, true with the inclusion of distributed Data Types in Riak 2.0, it was a phrase that hinted at the core structure of Riak itself.
Riak is a Key Value data store.
In a relational database, data is organized by tables that are separate and unique structures. Within these tables exist rows of data organized into columns. As such, interaction with the database is by retrieving or updating entire tables, individual rows, or a group of columns within a set of rows.
In contrast, Riak has a much simpler data model. An Object is both the largest and smallest element of data. As such, interaction with the database is by retrieving or modifying the entire object. There is no partial fetch or update of the data.
Keys in Riak are simply a binary value (or a string) that are used to identify Objects. The Key/Value pair (or Object) is stored in a higher level namespace called a Bucket. And, with Riak 2.0, there is an extra layer of abstraction known as Bucket Types.
This Key/Value/Bucket model enables broad flexibility in modeling the applications data domain with Riak as the data store for persistence.
Another NoSQL model that many are familiar with is the document store. Unlike the Key/Value model the data store is aware of the structure of the objects stored. These objects, or documents, are grouped into “collections” — which is analogous to a relational “table” — and the datastore provides a query mechanism to search collections for objects with particular attributes. When the data that is being persisted is easily rendered as a JSON document, a document store can seem a natural fit. Some common use cases include product catalog data and content management systems.
The Basho Docs have a lengthy tutorial entitled Using Riak as a Document Store that walks you through the process of leveraging Riak as a document store for a CMS. There are many approaches to modeling, but the tutorial demonstrates the power of Riak 2.0 features by combining the maps data type and indexing that data with Riak Search.
When the data you are persisting can be represented as JSON, and you require the ability to query the data, Riak 2.0 is an excellent solution for persisting and modeling document data. The flexibility of the Key/Value model, combined with the power of Riak Search and Riak Data Types, provide you with a highly scalable, highly available document store with rich, full-text query capabilities. In addition, the inclusion of the maps data type means that you don’t have to write complex client side resolution logic when faced with network partitions. Riak Data Types handle that conflict resolution automatically.
A scalable, available document store that is operationally simple may seem compelling enough to use Riak. But when you combine the characteristics of Riak with the multi-datacenter replication capabilities of Riak Enterprise, now you have a solution that enables you to bring your data operations closer to the end user.
Scalable, available, operationally simple, and replicated. That’s the power of using Riak as a document store.
December 30, 2014
At Basho, we are proud of our documentation. All design, updates, and edits are done with our community top of mind and we encourage community participation. Given the pace at which our documentarian expert, Luc Perkins, is updating the content, it can be easy to fall behind in reading new and updated materials. So we have a holiday gift to help you out.
Below is our Top 10 suggested New Year’s reading list.
#10 – A Migrating from an SQL Database to Riak tutorial can help prepare you as embrace a new style of development and persistence.
#7 – Strong consistency has gone from having light documentation to being one of our best-documented open-source features. Strong Consistency docs are spread across the following:
#6 – We now have client-side security docs! There’s an introductory doc that walks you a bit through how client security works in Riak as well as client-specific docs for Java, Ruby, Python, and Erlang.
#5 – A new Erlang VM Tuning doc. This is still a work in progress. As we said at the beginning, we really encourage community involvement. What tuning have you done to optimize your Erlang environment?
In addition to the above, there is new documentation on the topics below.
Drum roll please….
#1 – Riak 2.0 – if you missed this you missed a lot.
We want to thank everyone in the community who participates in making the Basho documentation the most useful set of materials possible. Remember: to submit issues is human, to submit PRs is divine.
Happy New Year!
In a previous post we briefly introduced Riak 2.0 data types. The addition of these distributed Data Types simplifies application development by automatically handling sibling resolution. This means developers can spend less time thinking about the complexities of vector clocks and sibling resolution and, instead, let Data Types support their applications’ data access patterns.
Understanding these data types requires a brief trip through history…
Riak 1.4 Counters
Riak 1.4 introduced counters as the first data types. Prior to 1.4 we’ve always said: “Your data is opaque to Riak,” — and it still can be — but with the addition of counters that is not longer the case. Riak knows what is stored in a counter key, and how to increment and decrement it through the counter API. It isn’t necessary to fetch, mutate, or put a counter. Instead you just incremented by 5 or decremented by 100. Vector Clocks, as discussed in the post entitled Clocks Are Bad, or, Welcome to the Wonderful World of Distributed Systems, as Riak knew how to merge concurrent writes there was never a sibling created.
Counters are very valuable, but you can not build many applications on just counters. Now, in Riak 2.0, we’ve added more data types. We believe that, with the addition of these data types you can model many applications’ data storage needs with greater simplicity, and never have to write sibling merge functions again.
What are CRDTs?
You may have heard a Basho presentation, or blog post, reference “CRDTs”. CRDT stands for (variously) Conflict-free Replicated Data Type, Convergent Replicated Data Type, Commutative Replicated Data Type, and others. The key, repeated, phrase is “Replicated Data Types”.
Replication is inherent in Riak. It is what the n-value defines. It is part of what lends to the availability and fault tolerance characteristics that Riak provides. Data Types are a common construct in computing. Sets, Bags, Lists, Registers, Maps, Counters…etc.
That leaves us to consider the “C”.
Conflict Free, or “Opaque No More”
Riak is an eventually consistent system. It leans, very much, towards the AP end of the CAP spectrum. (For more reading on the topic, the Practical Tradeoffs section of A Little Riak Book is particularly illuminating). This availability is achieved with mechanisms like sloppy quorum writes to fallback nodes. However, even without partitions and many nodes, interleaved or concurrent writes can lead to conflicts. Traditionally, Riak keeps all values and presents them to the user to resolve. The client application must have a deterministic way to resolve conflicts. It might be to pick the highest timestamp, or union all the values in a list, or something more complex. Whatever approach is chosen, it is ad-hoc, and created specifically for the data model and application at hand.
With Riak data types, there is still “conflict”. However, the resolution for that conflict is inherent and part of the data type’s design. The data types for Riak 2.0 converge automatically, at write and read time, on the server. If a client application can model its data using the data types provided, no sibling values will be seen and there is no longer a need to write ad-hoc, custom merge functions.
When modeling an applications data domain in a programming language, developers are familiar with composing state from a few primitive data types. Riak Data Types give the developer that power back and expressivity, and relieve them of the burden of design and testing deterministic merge functions. The key is that the data is no longer opaque to Riak. When the Data Types API is leveraged, Riak “knows” what type of thing is being stored and is able to perform the merge automatically.
When reading a Data Type from Riak, you will only ever see a single value. That value is still eventually consistent, but it will be as correct as it can be given the amount of entropy in the database. When the system is stable, all values will converge on a single, deterministic, correct value.
What Data Types Are Available?
Riak 2.0 includes the following Data Types:
- Counters: as in Riak 1.4
- Flags: enabled/disabled
- Sets: collections of binary values
- Registers: named Binary values with values also binary
- Maps: a collection of fields that supports the nesting of multiple Data Types
The conflict resolution, as discussed above, is intrinsic to the Data Type itself. This table provides greater detail.
|Data Type||Use Cases||Conflict Resolution Rule|
||Each actor keeps and independent count for increments and decrements. Upon merge, the pairwise maximum of any two actors will win (e.g. if one actor holds 172 and other holds 173, 173 will win upon merge)|
||Enable wins over disable|
||If an element is concurrent added and removed the add will win|
||The most chronologically recent value wins, based on timestamps|
||If a field is concurrently added, or updated and removed, the addd / update will win|
A new version of Riak, with new Data Types, allowing you to model your application in more expansive ways. Take these Data Types for a spin and be sure to let us know how you use them in your applications.
April 7, 2014
This month, it’s all about developer conferences – both local and international. If you’re in the area, come and say hi. We always love to chat Riak and can answer any questions you may have. If you aren’t going to be at any of these events, check out the Riak Mailing List for questions or Contact Us to help get started with Riak.
Here is a look at where we’ll be this month.
PyCon 2014: On the Wednesday (April 9th) of PyCon 2014, Basho Technical Evangelist, Tom Santero, will host a free workshop on “Building Applications on Riak,” starting at 1:30pm. PyCon takes place April 9-17 in Montreal, Canada.
ChefConf 2014: Basho will be attending ChefConf and our technical evangelists will be available to answer any Riak questions you may have. ChefConf 2014 takes place April 15-17 in San Francisco, CA.
CRAFT Conference: Basho is a proud sponsor of CRAFT Conference, which takes place April 23-25 in Budapest, Hungary. We will have a booth setup so stop by to grab some great swag and chat Riak.
NoSQL Matters: At NoSQL Matters: Cologne, Basho Technical Evangelist, Joel Jacobson, will present on “CRDTs in Riak,” one of the new features available with Riak 2.0. His talk will take place at 12:30pm on April 30th. NoSQL Matters: Cologne takes place April 29-30 in Cologne, Germany.
For a full list of where we’ll be, both in April and beyond, visit our Events Page.
March 26, 2014
Riak’s overarching design goal is simple: be maximally available. If your data center is on fire, Riak will be the last part of your stack to fail (and hopefully, you’ve purchased an enterprise license, so there’s another cluster in another data center ready to go at all times).
In order to make sure your data can survive server failures, Riak retains multiple copies (replicas) and allows lock-free, uncoordinated updates.
This then open ups the possibility that data will be out of sync across a cluster. Riak manages this issue in three distinct stages: entropy detection, correction, and conflict resolution.
Among the oldest and simplest tools in Riak is Read Repair, which, as its name implies, is triggered when a read request is received. If the server coordinating the operation notices that the servers responsible for the key do not agree on its value, correction is required.
A more recent feature in Riak is Active Anti-Entropy (often shortened to AAE). Effectively, this is the proactive version of read repair and runs in the background. Riak maintains hash trees to monitor for inconsistent data between servers; when divergent values are detected, correction is mandated.
As discussed in the blog post, Clocks Are Bad, Or, Welcome to the Wonderful World of Distributed Systems, automatically determining the “correct” value in the event of a conflict is not simple, and often not possible at the database layer.
Using contextual metadata called vector clocks, Riak will attempt to determine whether one of the discovered values is derived from the other. In that case, it can safely choose the most recent value. This value is written to all servers that have a copy of the data and conflict resolution is not required.
If Riak can’t verify such a causal relationship, things get more difficult.
Riak’s default behavior, is to fall back to server clocks to determine a winner. This can lead to unexpected results if concurrent updates are received but, on the positive side, conflict resolution will not be required.
If Riak is instead configured with
allow_mult=true to protect data integrity, even when independent writes are received, Riak will write both values to the servers as siblings for later conflict resolution.
Conflict resolution means that Riak detects a conflict, can’t resolve it intelligently, and isn’t instructed to resolve it otherwise.
Next time the application attempts to read such a value, instead of receiving one answer, it’s going to receive (at least) two. It is now the application’s responsibility to deal with the conflict and provide a new value back to Riak for future reads.
This can be trivial (pick one value), obvious (merge all values), or tricky (apply business logic and come back with something different).
With Riak 2.0, Basho is introducing Riak Data Types, which are designed to handle conflict resolution automatically. If your data can be formulated as a set or a map (not terribly dissimilar from JSON), Riak can process and resolve the siblings for you when a read request is received.
Many databases, particularly distributed ones, are effectively non-deterministic in the presence of concurrent writes. If they try to enforce consistency on writes, they sacrifice availability and often data integrity. If they don’t enforce consistency, they may rely on server (or worse, client) clocks to pick a winner, if they even have a strategy at all.
Riak is unique in encouraging developers to think about conflict resolution. Why? Because, for large distributed systems, network and server failures are a fact of life. For very large distributed systems, data duplication and inconsistency is unavoidable. Since Riak is designed for scale, it’s better to provide a structure for proper resolution than to pretend conflicts don’t exist.
February 5, 2014
At the recent meetup for the New York Linux Users Group (NYLUG), Basho Technical Evangelist, Tom Santero, presented “An Introduction to Basho’s Riak.” In this talk, Tom explains how Riak addresses the challenges of concurrent data storage at scale. He discusses the various design decisions, tradeoffs made, and theories at work within Riak. He also provides guidance as to how you might deploy Riak in production and why.
In addition to introducing the basics of Riak and its key/value data model, Tom presents some of the exciting features being introduced with Riak 2.0. Riak Data Types adds counters, sets, and maps to Riak – allowing for better conflict resolution. They enable developers to spend less time thinking about the complexities of vector clocks and sibling resolution and, instead, focusing on using familiar, distributed data types to support their applications’ data access patterns.
You can watch Tom’s full talk below:
For more information about Riak and how it differs from traditional databases, check out the whitepaper, “From Relational to Riak.”
To see where Basho will be presenting next, visit the Events Page.
January 29, 2014
On Fridays, Basho hosts a Hangout to discuss various topics related to Riak and distributed systems. While Basho evangelists and engineers lead these live Hangouts, they also bring in experts from various other companies, including Kyle Kingsbury (Fatcual), Jeremiah Peschka (Brent Ozar Unlimited), and Stuart Halloway (Datomic).
If you haven’t attended a Hangout, we have recorded them all and they are available on the Basho Technologies Youtube Channel. You can also watch each below.
Data Types and Search in Riak 2.0
Featuring Mark Phillips (Director of Community, Basho), Sean Cribbs (Engineer, Basho), Brett Hazen (Engineer, Basho), and Luke Bakken (Client Services Engineer, Basho)
Bucket Types and Configuration
Featuring Tom Santero (Technical Evangelist, Basho), Joe DeVivo (Engineer, Basho), and Jordan West (Engineer, Basho)
Riak 2.0: Security and Conflict Resolution
Featuring John Daily (Technical Evangelist, Basho), Andrew Thompson (Engineer, Basho), Justin Sheehy (CTO, Basho), and Kyle Kingsbury (Factual)
Fun with Java and C Clients
Featuring Seth Thomas (Technical Evangelist, Basho), Brett Hazen (Engineer, Basho), and Brian Roach (Engineer, Basho)
Property Based Testing
Featuring Tom Santero (Technical Evangelist, Basho) and Reid Draper (Engineer, Basho)
Datomic and Riak
Featuring Hector Castro (Technical Evangelist, Basho), Dmitri Zagidulin (Professional Services, Basho), and Stuart Halloway (Datomic)
Featuring John Daily (Technical Evangelist, Basho), David Rusek (Engineer, Basho), and Jeremiah Peschka (Brent Ozar Unlimited)
A Look Back
Featuring John Daily (Technical Evangelist, Basho), Hector Castro (Technical Evangelist, Basho), Andy Gross (Chief Architect, Basho), and Mark Phillips (Director of Community, Basho)
December 12, 2013
At RICON West this year, we announced the Technical Preview of Riak 2.0. Before the full release (which will be available early next year), we are encouraging users to download the preview and start testing some of the exciting new features.
At RICON, we had many of the engineers who worked on these new features present their work. One feature that we’re particularly excited about is the addition of Riak Data Types. Riak 2.0 builds on eventually consistent counters (added with Riak 1.4) with the addition of maps and sets. These Riak Data Types simplify application development without sacrificing Riak’s availability and partition tolerance characteristics.
In “CRDTs: An Update (or Maybe Just a PUT),” Basho engineer, Sam Elliott, presents on the work being done with Riak Data Types. Sam and a few other engineers at Basho have been integrating cutting-edge research on data types (known as CRDTs), pioneered by INRIA, to create Riak Data Types. Sam talks about the latest developments on CRDTs and walks developers through how to use them in their own applications.
In addition to Sam’s talk, we also had a talk from Jeremy Ong on “CRDTs in Production.” His talk provides real world solutions to leveraging CRDT concepts for an industrial application via case study. He also offers some suggestions on how to tackle data operations that can’t always commute. You can watch his full talk below.
For more information about Riak Data Types, check out this overview on Github.
To watch all of the sessions from RICON West 2013, visit the Basho Technologies Youtube Channel.