March 10, 2014
Hector Castro is one of the Technical Evangelists here at Basho. Over the past few months, he has been presenting at various meetups and conferences about how to approach data modeling in Riak with his talk, “Throw Some Keys On It.” This talk sets the stage by discussing the more familiar relational database management systems (RDBMS) and some of its key features, including relationships, transactions, schemas, and Structured Query Language (SQL).
However, RDBMS also come with some tradeoffs and most applications don’t require all of its features. In addition, many applications now have availability, scalability, and latency requirements that negate many of the RDBMS benefits. That’s where key/value stores (like Riak) come in. Key/value data stores have a number of benefits compared to relational databases – including schemaless design, single-access reads, ability to handle write-heavy workloads, scalability, and a simple interface. While they are becoming increasingly more popular, many developers still think about data modeling from a relational mindset.
Using the Uber mobile application as an example, this talk helps developers think about decomposing more complex problems into smaller, simpler key/value pairs.
Videos of this talk will be available shortly but in the meantime, slides from this talk can be found below.
January 27, 2014
On the official Basho docs, we compare Riak to multiple other databases. We are currently working on updating these comparisons, but, in the meantime, we wanted to provide a more up-to-date comparison for one of the more common questions we’re asked: How does Riak compare to Cassandra?
Cassandra looks the most like Riak out of any other widely-deployed data storage technology in existence. Cassandra and Riak have architectural roots in Amazon’s Dynamo, the system Amazon engineered to handle their highly available shopping cart service. Both Riak and Cassandra are masterless, highly available stores that persist replicas and handle failure scenarios through concepts such as hinted handoff and read-repair. However, there are certain key differences between the two that should be considered when evaluating them.
Amazon’s Dynamo utilized a Key/Value data model. Early in Cassandra’s development, a decision was made to diverge from keys and values toward a wide-row data model (similar to Google’s BigTable). This means Cassandra is a Key/Key/Value store, which includes the concept of Column Families that contain columns. With this model, Cassandra is able to handle high write volumes, typically by appending new data to a row. This also allows Cassandra to perform very efficient range queries, with the tradeoff being a more rigid data model since rows are “fixed” and non-sequential read operations often require several disk seeks.
On the other hand, Riak is a straight Key/Value store. We believe this offers the most flexibility for data storage. Riak’s schemaless design has zero restrictions on data type, so an object can be a JSON document at one moment and a JPEG at the next.
Like Cassandra, Riak also excels at high write volumes. Range queries can be a little more costly, though still achievable through Secondary Indexes. In addition, there are a number of data modeling tips and tricks for Riak that make it easy to expose access to data in ways that sometimes aren’t as obvious at first glance. Below are a few examples:
In Riak, multi-datacenter replication is achieved by connecting independent clusters, each of which own their own hash ring. Operators have the ability to manage each cluster and select all or part of the data to replicate across a WAN. Multi-datacenter replication in Riak features two primary modes of operation: full sync and real-time. Data transmitted between clusters can be encrypted via OpenSSL out-of-the-box. Riak also allows for per-bucket replication for more granular control.
Cassandra achieves replication across WANs by splitting the hash ring across two or more clusters, which requires operators to manually define a NetworkTopologyStrategy, Replication Factor, a Replication Placement Strategy, and a Consistency Level for both local and cross data center requests.
Conflict Resolution and Object Versioning
Cassandra uses wall clock timestamps to establish ordering. The resolution strategy in this case is Last Write Wins (LWW), which means that data may be overwritten when there is write contention. The odds of data loss are magnified by (inevitable) server clock drift. More details on this can be found in the blog, “Clocks are Hard, or, Welcome to the Wonderful World of Distributed Systems.”
Riak uses a data structure called vector clocks to track the causal ordering of updates. This per-object ancestry allows Riak to identify and isolate conflicts without using system clocks.
In the event of a concurrent update to a single key, or a network partition that leaves application servers writing to Riak on both sides of the split, Riak can be configured to keep all writes and expose them to the next reader of that key. In this case, choosing the right value happens at the application level, allowing developers to either apply business logic or some common function (e.g. merge union of values) to resolve the conflict. From there, that value can be written back to Riak for its key. This ensures that Riak never loses writes.
Riak Data Types, first introduced in Riak 1.4 and expanded in the upcoming Riak 2.0, are designed to converge automatically. This means Riak will transparently manage the conflict resolution logic for concurrent writes to objects.
In the event of server failures and network problems, Riak is designed to always accept read and write requests, even if the servers that are ordinarily responsible for that data are unavailable.
Cassandra will allow writes to (optionally) be stored on alternative servers, but will not allow that data to be retrieved. Only after the cluster is repaired and those writes are handed off to an appropriate replica server (with the potential data loss that timestamp-based conflict resolution implies, as discussed earlier) will the data that was written be available to readers.
Imagine a user working with a shopping cart when the application is unable to connect to the primary replicas. The user can re-add missing items to the cart but will never actually see the items show up in the cart (unless the application performs its own caching, which introduces more layers of complexity and points of failure).
When handling missing or divergent/stale data, Riak and Cassandra have many similarities. Both employ a passive mechanism where read operations trigger the repair of inconsistent replicas (known as read-repair). Both also use Active Anti-Entropy, which builds a modified Merkle tree to track changes or new inserts on a per hash-ring-partition basis. Since the hash rings contain overlapping keys, the trees are compared and any divergent or missing data is automatically repaired in the background. This can be incredibly effective at combating problems such as bitrot, since Active Anti-Entropy does not need to wait for a read operation.
The key difference in implementation is that Cassandra uses short-lived, in-memory hash trees that are built per Column Family and generated as snapshots during major data compactions. Riak’s trees are on-disk and persistent. Persistent trees are safer and more conducive to ensuring data integrity across much larger datasets (e.g. 1 billion keys could easily cost 8-16GB of RAM in Cassandra versus 8-16GB of disk in Riak).
Both Cassandra and Riak are eventually consistent, scalable databases that have strengths for specific use cases. Each has hundreds of thousands of hours of engineering invested and the commercial backing and support offered by their respective companies, Datastax and Basho. At Basho, we have labored to make Riak very robust and easy to operate at both large and small scale. For more information on how Riak is being used, visit our Riak Users page. For a look at what’s to come, download the Technical Preview of Riak 2.0.
April 17, 2013
This post looks at five commonly asked questions about Riak. For more questions and answers, check out our Riak FAQ.
What hardware should I use with Riak?
Riak is designed to be run on commodity hardware and is run in production on a variety of different server types on both private and public infrastructure. However, there are several key considerations when choosing the right infrastructure for your Riak deployment.
RAM is one of the most important factors – RAM availability directly affects what Riak backend you should use (see question below), and is also required for complex MapReduce queries. In terms of disk space, Riak automatically replicates data according to a configurable n_val. A bucket-level property that defaults to 3, n_val determines how many copies of each object will be stored, and provides the inherent redundancy underlying Riak’s fault-tolerance and high availability. Your hardware choice should take into consideration how many objects you plan to store and the replication factor, however, Riak is designed for horizontal scale and lets you easily add capacity by joining additional nodes to your cluster. Additional factors that might affect choice of hardware include IO capacity, especially for heavy write loads, and intra-cluster bandwidth. For additional factors in capacity planning, check out our documentation on cluster capacity planning.
Riak is explicitly supported on several cloud infrastructure providers. Basho provides free Riak AMIs for use on AWS. We recommend using large, extra large, and cluster compute instance types on Amazon EC2 for optimal performance. Learn more in our documentation on performance tuning for AWS. Engine Yard provides hosted Riak solutions, and we also offer virtual machine images for the Microsoft VM Depot.
What backend is best for my application?
Riak offers several different storage backends to support use cases with different operational profiles. Bitcask and LevelDB are the most commonly used backends.
Bitcask was developed in-house at Basho to offer extremely fast read/write performance and high throughput. Bitcask is the default storage engine for Riak and ships with it. Bitcask uses an in-memory hash-table of all keys you write to Riak, which points directly to the on-disk location of the value. The direct lookup from memory means Bitcask never uses more than one disk seek to read data. Writes are also very fast with Bitcask’s write-once, append-only design. Bitcask also offers benefits like easier backups and fast crash recovery. The inherent limitation is that your system must have enough memory to contain your entire keyspace, with room for a few other operational components. However, unless you have an extremely large number of keys, Bitcask fits many datasets. Visit our documentation for more details on Bitcask, and use the Bitcask Capacity Calculator to assist you with sizing your cluster.
LevelDB is an open-source, on-disk key-value store from Google. Basho maintains a version of LevelDB tuned specifically for Riak. LevelDB doesn’t have Bitcask’s memory constraints around keyspace size, and thus is ideal for deployments with a very large number of keys. In addition to this advantage, LevelDB uses Google Snappy data compression, which provides particular efficiency for text data like raw text, Base64, JSON, HTML, etc. To use LevelDB with Riak, you must the change the storage backend variable in the app.config file. You can find more details on LevelDB here.
Riak also offers a Memory storage backend that does not persist data and is used simply for testing or small amounts of transient state. You can also run multiple backends within a single Riak instance, which is useful if you want to use different backends for different Riak buckets or use a different storage configuration for some buckets. For in-depth information on Riak’s storage backends, see our documentation on choosing a backend.
How do I model data using Riak’s key/value design?
Riak uses a key/value design to store data. Key/value pairs comprise objects, which are stored in buckets. Buckets are flat namespaces with some configurable properties, such as the replication factor. One frequent question we get is how to build applications using the key/value scheme. The unique needs of your application should be taken into account when structuring it, but here are some common approaches to typical use cases. Note that Riak is content-agnostic, so values can be any content type.
|Session||User/Session ID||Session Data|
|Content||Title, Integer||Document, Image, Post, Video, Text, JSON/HTML, etc.|
|Advertising||Campaign ID||Ad Content|
|Sensor||Date, Date/Time||Sensor Updates|
|User Data||Login, Email, UUID||User Attributes|
For more comprehensive information on building applications with Riak’s key/value design, view the use cases section of our documentation.
What other options, besides strict key/value access, are there for querying Riak?
Most operations done with Riak will be reading and writing key/value pairs to Riak. However, Riak exposes several other features for searching and accessing data: MapReduce, full-text search, and secondary indexing.
Riak also provides Riak Search, a full-text search engine that indexes documents on write and provides an easy, robust query language and SOLR-like API. Riak Search is ideal for indexing content like posts, user bios, articles, and other documents, as well as indexing JSON data. For more information, see the documentation on Riak Search.
Secondary indexing allows you to tag objects in Riak with one or more queryable values. These “tags” can then be queried by exact or range value for integers and strings. Secondary indexing is great for simple tagging and searching Riak objects for additional attributes. Check out more details here.
How does Riak differ from other databases?
We often get asked how Riak is different from other databases and other technologies. While an in-depth analysis is outside the scope of this post, the below should point you in the right direction.
Riak is often used by applications and companies with a primary background in relational databases, such as MySQL. Most people who move from a relational database to Riak cite a few reasons. For one, Riak’s masterless, fault-tolerant, read/write available design make it a better fit for data that must be highly available and resilient to failure scenarios. Second, Riak’s operational profile and use of consistent hashing means data is automatically redistributed as you add machines, avoiding hot spots in the database and manual resharding efforts. Riak is also chosen over relational databases for the multi-datacenter capabilities provided in Riak Enterprise. A more detailed look at the difference between Riak and traditional databases and how to make the switch can be found in this whitepaper, From Relational to Riak.
A more detailed look at the technical differences between Riak and other NoSQL databases can be found in the comparisons section of our documentation, which covers databases such as MongoDB, Couchbase, Neo4j, Cassandra, and others.
March 13, 2013
For a complete overview, download the whitepaper, “Gaming on Riak: A Technical Introduction.” To see how other gaming companies are using Riak, visit us at the Game Developers Conference at Booth #202!
As discussed in our previous post, “Gaming on Riak: A Brief Overview and User Case Studies,” Riak can provide a number of advantages for gaming platforms. Content agnostic, an HTTP API, many client libraries, and a simple key/value data model, Riak is a flexible data store that can be used for a variety of different use cases in the gaming industry. This post looks at some common examples and how to start building them in Riak.
Player and Session Data: Riak can serve and store key player and session data with predictable low latency, and ensures it is available even in the event of node failure and network partition. This data may include user and profile information, game performance, statistics and rankings, and more. In Riak, all objects are stored on disk as binaries, providing flexible storage for many content types. Since Riak is schema-less, applications can evolve without changing an underlying schema, providing agility with growth and change.
Social Information: Riak can be used for social content such as social graph information, player profiles and relationships, social authentication accounts, and other types of social gaming data.
Content: Riak is often used to store text, documents, images, videos and other assets that power gaming experiences. This data often needs to be highly available and able to scale quickly to attract and keep users.
Global Data Locality: Gaming requires a low-latency experience, no matter where the players are located. Riak Enterprise’s multi-datacenter replication feature means data can be served to global users quickly.
Below are some common approaches to structuring gaming data with Riak’s key/value model:
Riak offers robust additional functionality on top of the fundamental key/value model. For more information on these options as well as how to implement them, their architecture, and their limitations, check out the documentation on searching and accessing data in Riak.
Riak Search is a distributed, full-text search engine. It provides support for various MIME types & analyzers, and robust querying including exact matches, wildcards, range queries, proximity searches, and more.
Possible Use Cases: Searching player and game information.
Secondary Indexing (2i) gives developers the ability, at write time, to tag an object stored in Riak with one or more queryable values. Indexes can be either integers or strings and can be queried by either exact matches or ranges of an index.
Possible Use Cases: Tagging player information with user relationships, general attributes, or other metadata.
Possible Use Cases: Filtering game data by tag, counting items, and extracting links to related data.
To learn more about how your gaming platform can benefit from Riak, download “Gaming on Riak: A Technical Introduction.” For more information about Riak, sign up for our webcast on Thursday, March 14.
July 7, 2010
Thank you to all who attended the webinar yesterday. The turnout was great, and the questions at the end were also very thoughtful. Since I didn’t get to answer very many, I’ve reviewed the questions below, in no particular order. If you want to review the slides from yesterday’s presentation, they’re on Slideshare.
Q: You say listing keys is expensive. How are Map phases affected? Does the number of keys in a bucket have an effect on the expense of the operation? (paraphrased)
Listing keys (for a single bucket, there is no analog for the entire system) requires traversing the entire keyspace, even examining keys that don’t belong to the requested bucket. If your Map/Reduce query uses a whole bucket as its inputs, it will be nearly as expensive as listing keys back to the client; however, Map phases are executed in parallel on the nodes where the data lives, so you get the full benefits of parallelism and data-locality when it executes. The expense of listing keys is taken before any Map phase begins.
It bears reiterating that the expense of listing keys is proportional to the total number of keys stored (regardless of bucket). If your bucket has only 10 keys and you know what they are, it will probably be more efficient to list them as the inputs to your Map/Reduce query than to use the whole bucket as an input.
Q: How do you recommend modeling relationships that require a large number of associations (thousands or millions)?
This is difficult to do, and I won’t say there’s an easy or best answer. One idea that came up in the IRC
room after the webinar was building a B-tree-like data-structure that could be grown to fit the number of associations. This solves the one-to-many relationship, but will require extra handling and care on the part of your application. In some cases, where you only need to know membership in the relationship, a bloom filter might be appropriate. If you must model lots of highly-connected data, consider throwing a graph database in the mix. Riak is not going to fit all use-cases, some models will be awkward.
Q: My company provides a Java web application and analytics solution that uses JDO to persist to and query from a relational database. Where would I start in integrating with Riak?
Since I haven’t done Java in a serious way for a long time, I can’t speak to the specifics of JDO, or how you might work on migrating away from it. However, I have found that most ORMs hide things from the
developer that he/she should really be aware of — how the mapping is performed, what queries are executed, etc. You’ll likely have to look into the guts of how JDO persists and retrieves objects from the database, then step back and reevaluate what your top queries are and how Riak can help improve or simplify those operations. This is all in the theme of the webinar: Know your data!
Q: Is the source code for the example application and schema design available? (paraphrased)
No, there isn’t any sample code yet. You can play with the existing application (Lowdown) at lowdownapp.com. The other authors and I are seeking a few people to take over its development, and the initial group we contacted have indicated it will be open-sourced.
Q: Is there an way to get notified on changes in a bucket?
That’s not built-in to Riak. However, you could write a post-commit hook in Erlang that pushes a notification to RabbitMQ, for example, then have the interested parties consume messages from that queue.
Q: What mechanism does Riak have to deal with the unique user issue?
Riak has neither write locks nor transactions. There is no way to absolutely guarantee uniqueness without introducing an intermediary that acts as a single-arbiter (and point-of-failure). However, in cases when you aren’t experiencing high write-concurrency on the data in question there are a few things you can do to simulate the uniqueness constraint:
- Check for existence of the key before writing. In HTTP, this is as simple as a HEAD request. If the response is
Found, the object probably doesn’t exist.
- Use a conditional PUT (in HTTP) when creating the object. The
If-None-Match: *header should prevent you from blindly overwriting an existing key.
Neither of these solutions are bullet-proof because all operations happen in Riak asynchronously. Remember that it’s eventually consistent, meaning that not all parts of the system may agree at all times, but they will converge on a single state over time. There will be corner-cases where a key doesn’t exist when you check for it, the write via the conditional request succeeds, and you still end up creating an object in conflict. Caveat emptor.
Q: Are the intermediate results of Link and Map phases cached?
Yes, the results of both map and link phases are cached in a pretty naive LRU. The development team has plans to improve its behavior in future versions of Riak.
Q: Could you comment on commit hooks and what place they have, if any, in riak schema design? Would it make sense to use hooks to build an index e.g. keys in a bucket?
Yes, commit hooks are very useful in schema design. For example, you could use a pre-commit hook to validate the format of data before it’s stored. You could use post-commit hooks to send the data to external services (see above) or, as you suggest, build an index in another bucket. Building a secondary index reliably is complicated though, and it’s something I want to work on over the next few months.
Q: So if you have allow_mult=false are there cases where riak will return a conflict 409? Is the default that last write wins?
Riak never returns a
409 Conflict status from the HTTP interface on writes. If you supply a conditional header (
If-Match, for example) you might get a
412 Precondition Failed response if the ETag of the object to be modified doesn’t match the header. In general, it is Riak’s policy to accept writes regardless of the internal state of the object.
The “last write wins” behavior comes in two flavors: “clobbering” writes, and softer “show me the latest one” reads. The latter is the default behavior, in which siblings might occur internally (and the vector clock grown) but not exposed to the client; instead it returns the sibling with the latest timestamp at read/GET time and “throws away” new writes that are based on older (ancestor) vclocks. The former actually ignores vector clocks for the specified bucket, providing no guarantees of causal ordering of writes. To turn this behavior on, set the
last_write_wins bucket property to
true. Except in the most extreme cases where you don’t mind clobbering things that were written since the last time you read, we recommend using the default behavior. If you set
allow_mult=true, conflicting writes (objects with divergent vector clocks, not traceable descendents) will be exposed to the client with a
300 Multiple response.
Again, thanks for attending! Look for our next webinar in about two weeks.
June 30, 2010
Moving applications to Riak involves a number of changes from the status quo of RDBMS systems, one of which is taking greater control over your schema design. You’ll have questions like: How do you structure data when you don’t have tables and foreign keys? When should you denormalize, add links, or create map-reduce queries? Where will Riak be a natural fit and where will it be challenging?
We invite you to join us for a free webinar on Thursday, July 8 at 2:00PM Eastern Time to talk about Schema Design for Riak. We’ll discuss:
- Freeing yourself of the architectural constraints of the “relational” mindset
- Gaining a fuller understanding of your existing schema and its queries
- Strategies and patterns for structuring your data in Riak
- Tradeoffs of various solutions
We’ll address the above topics and more as we design a new Riak-powered schema for a web application currently powered by MySQL. The presentation will last 30 to 45 minutes, with time for questions at the end.
Fill in the form below if you want to get started building applications on top of Riak!
Sorry, registration is closed.