January 7, 2014
Writing an application that can take full advantage of Riak’s robust scaling properties requires a different way of looking at data storage and retrieval. Developers who bring a relational mindset to Riak may create applications that work well with a small data set but start to show strain in production, particularly as the cluster grows.
Thus, this looks at some of the common conceptual challenges.
Riak offers query features such as secondary indexes (2i), MapReduce, and full-text search, but throwing a large quantity of data into Riak and expecting those tools to find whatever you need is setting yourself (and Riak) up to fail. Performance will be poor, especially as you scale.
Reads and writes in Riak should be as fast with ten billion values in storage as with ten thousand.
Key/value operations seem primitive (and they are) but you’ll find they are flexible, scalable, and very fast (and predictably so).
Treat 2i and friends as tools to be applied judiciously, design the main functionality of your application as if they don’t exist, and your software will continue to work at blazing speeds when you have petabytes of data stored across dozens of servers.
Normalizing data is generally a useful approach in a relational database, but unlikely to lead to happy results with Riak.
Riak lacks foreign key constraints and join operations, two vital parts of the normalization story, so reconstructing a single record from multiple objects would involve multiple read requests; certainly possible and fast enough on a small scale, but not ideal for larger requests.
Instead, imagine the performance of your application if most of your requests were a single, trivial read. Preparing and storing the answers to queries you’re going to ask later is a best practice for Riak.
Ducking Conflict Resolution
One of the first hurdles Basho faced when releasing Riak was educating developers on the complexities of eventual consistency and the need to intelligently resolve data conflicts.
Because Riak is optimized for high availability, even when servers are offline or disconnected from the cluster due to network failures, it is not uncommon for two servers to have different versions of a piece of data.
The simplest approach to coping with this is to allow Riak to choose a winner based on timestamps. It can do this more effectively if developers follow Basho’s guidance on sending updates with vector clock metadata to help track causal history, but often concurrent updates cannot be automatically resolved via vector clocks, and trusting server clocks to determine which write was the last to arrive is a terrible conflict resolution method.
Even if your server clocks are magically always in sync, are your business needs well-served by blindly applying the most recent update? Some databases have no alternative but to handle it that way, but we think you deserve better.
Riak 2.0, when installed on new clusters, will default to retaining conflicts and requiring the application to resolve them, but we’re also providing replicated data types to automate conflict resolution on the servers.
If you want to minimize the need for conflict resolution, modeling with as much immutable data as possible is a big win.
For years, functional programmers have been singing the praises of immutable data, and it confers significant advantages when using a distributed data store like Riak.
Most obviously, conflict resolution is dramatically simplified when objects are never updated.
Even in the world of single-server database servers, updating records in place carries costs. Most databases lose all sense of history when data is updated, and it’s entirely possible for two different clients to overwrite the same field in rapid succession leading to unexpected results.
Some data is always going to be mutable, but thinking about the alternative can lead to better design.
SELECT * FROM <table>
A perfectly natural response when first encountering a populated database is to see what’s in it. In a relational database, you can easily retrieve a list of tables and start browsing their records.
As it turns out, this is a terrible idea in Riak.
Riak is optimized for unstructured, opaque data; however, it is not designed to allow for trivial retrieval of lists of buckets (very loosely analogous to tables) and keys.
Doing so can put a great deal of stress on a large cluster and can significantly impact performance.
It’s a rather unusual idea for someone coming from a relational mindset, but being able to algorithmically determine the key that you need for the data you want to retrieve is a major part of the Riak application story.
Because Riak sends multiple copies of your data around the network for every request, values that are too large can clog the pipes, so to speak, causing significant latency problems.
Basho generally recommends 1-4MB objects as a soft cap; larger sizes are possible with careful tuning, however.
For significantly larger objects, Riak CS offers an Amazon S3-compatible (and also OpenStack Swift-compatible) key/value object store that uses Riak under the hood.
Running a Single Server
This is more of an operations anti-pattern, but it is a common misunderstanding of Riak’s architecture.
It is quite common to install Riak in a development environment using its
devrel build target, which creates five full Riak stacks (including Erlang virtual machines) to run on one server to simulate a cluster.
However, running Riak on a single server for benchmarking or production use is counterproductive, regardless of whether you have one stack or five on the box.
It is possible to argue that Riak is more of a database coordination platform than a database itself. It uses Bitcask or LevelDB to persist data to disk, but more importantly, it commonly uses at least 64 such embedded databases in a cluster.
Needless to say, if you run 64 databases simultaneously on a single filesystem you are risking significant I/O and CPU contention unless the environment is carefully tuned (and has some pretty fast disks).
Perhaps more importantly, Riak’s core design goal, its raison d’être, is high availability via data redundancy and related mechanisms. Writing three copies of all your data to a single server is mostly pointless, both contributing to resource contention and throwing away Riak’s ability to survive server failure.
So, Now What?
As always, we recommend visiting Basho’s docs website for more details on how to build and run Riak, and many of our customers have given presentations on their use cases of Riak, including data modeling.
Also, keep an eye on the Basho blog where we provide high-level overviews like this of Riak and the larger non-relational database world.
For a detailed analysis of your needs and modeling options, contact Basho regarding our professional services team.
- Why Riak (docs.basho.com)
- Data Modeling (docs.basho.com)
- Clocks Are Bad, Or, Welcome to the Wonderful World of Distributed Systems (Basho blog)
- A Little Riak Book
December 26, 2013
This year, we added a wide variety of resources to help you better understand Riak and Riak CS for different use cases. These resources include whitepapers, webinars and videos, sample apps, and outside articles and reports. Here’s a look at some of what was added in 2013.
With multiple releases over the past year, all of the primary product whitepapers have been updated. Check out new versions of:
We also added a number of vertical-specific whitepapers to help companies in various industries better evaluate Riak and Riak CS.
Intro to Riak Webinar
In addition to hosting multiple introduction webinars throughout the year, we also created a standalone “Intro to Riak” webinar that can be watched and shared easily. To watch this webinar, simply fill out the webinar request form.
To showcase the power of indexing in Riak, we created a Zombie Sample App that’s run on Riak. This app has one million “Zombielepsy” victims loaded into Riak and lets the user locate them using zip code as the index value. It supports both Term-Based Inverted Indexes and Secondary Indexes. In addition to better understanding indexing in Riak, users can:
- Create a Zombie Sighting Report System so the concentration of live zombies in an area can quickly be determined based on the count and last report date.
- Add a crowd-sourced Inanimate Zombie Reporting System so that members of the non-zombie population can report inanimate zombies.
- Add a correlation feature, utilizing Graph CRDTs, so we can find our way back to Patient Zero.
More details about this app can be found here.
Articles and Reports
Outside of what has been created by Basho, we think that outside sources can also be a valuable tool when evaluating Riak or Riak CS. Our updated News Page helps to showcase much of this, but we thought we’d call out a few helpful articles from the past year.
Information Week – “Big Data Reshapes Weather Channel Predictions”
IDC – “IDC MarketScape: Worldwide Object-Based Storage 2013 Vendor Assessment”
The Register – “What Do We Want? Strong Consistency! When Do We…Oh It’s In Riak v2”
Programmable Web – “Seagate Releases Open Source API to Eliminate Data Storage Complexity”
The Register – “Distributed Systems Boffins Flock to RICON West”
Computer Weekly – “Computer Weekly European User Awards for Storage: Winners”
Gartner – “IT Market Clock for Database Management Systems, 2013”
Information Week – “Basho Embraces OpenStack with Riak Cloud Storage”
Flyclops Blog – “Taking the Riak Plunge”
Forrester Research – “To Get National Healthcare Right Requires Adaptive Intelligence”
451 Research – “451 Research Survey Highlights Growing Adoption of NoSQL Databases”
GigaOm – “Storage Player Basho Open Sources Riak CS”
October 15, 2013
Ripple has not been maintained because we’ve learned that it’s not the right way to think about Riak. Using the riak-client APIs directly leads to better applications. We’re moving Ripple to basho-labs to avoid confusion.
The Ripple State of Mind
The Ripple document-relational mapper tool for Riak allows you to treat Riak objects like Ruby objects, very similarly to how ActiveRecord lets you treat Postgres rows like Ruby objects. This neglects the fundamental differences between Postgres and Riak, and encourages developers to use Riak badly.
SQL is a nice fit for Rails-like object usage because adding indexes isn’t prohibitively expensive, querying with indexes is cheap, and there’s a query planner that can use or mix indexes when available and can resort to a table scan when they’re not. Ripple, while it does have secondary index (2i) support, doesn’t have a planner to do set math on multiple indexes, so you either get to implement that yourself or write composite indexes. Adding an index after you have a dataset in production is hard too; it either only applies to new data or requires an expensive migration step, in which you load and re-save old records.
Ripple doesn’t provide any way to use some Riak 1.4 and planned 2.0 features, such as streaming 2i, multi-get, 2i return terms, and CRDTs. Ripple also doesn’t make it easy to make your frontend vector clock aware, which limits its usefulness in scenarios that create siblings.
These are complex features, and trying to wrap them in Ripple won’t necessarily make them easier to use.
My experience with complex Rails-style applications is that models eventually grow a bunch of class and instance methods to handle cases that are awkward for the ORM layer. An ActiveRecord model might have a SQL-using method as an optimization for a specific use case, or instance methods that perform a write and return something sensible in the case of a Postgres constraint violation.
Rails applications that want to use SQL without using ActiveRecord’s ORM can do so: just useconnection.select_all and write some SQL. With Ripple, you can always drop down to riak-ruby-client and do work that way.
With that in mind, instead of the generic Riak 1.0 feature set in Ripple, we recommend wrapping Riak client methods in model objects. This exposes more complexity initially but, as your application grows and evolves, provides better opportunities to integrate new Riak features that help queryability, denormalize to reduce the number of Riak interactions that have to be done, automate certain data types, or provide consistency guarantees in appropriate situations.
We’re moving Ripple to the basho-labs organization on GitHub to accurately reflect its status as unmaintained and deprecated.
August 28, 2013
What is an Index?
In Riak, the fastest way to access your data is by its key.
However, it’s often useful to be able to locate objects by some other value, such as a named collection of users. Let’s say that we have a user object stored under its username as the key (e.g.,
thevegan3000) and that this particular user is in the
Administrators group. If you wanted to be able to find all users, such as
thevegan3000 who are in the Administrators group, then you would add an index (let’s say,
user_group) and set it to
administrator for those users. Riak has a super-easy-to-use option called Secondary Indexes that allows you to do exactly this and it’s available when you use either the LevelDB or Memory backends.
Using Secondary Indexes
Secondary Indexes are available in the Riak APIs and all of the official Riak clients. Note that
user_group_bin when accessing the API because we’re storing a binary value (in most cases, a string).
Add and retrieve an index in the Ruby Client:
In the Python Client:
In the Java Client:
More Example Use Cases
Not only are indexes easy to use, they’re extremely useful:
- Reference all orders belonging to a customer
- Save the users who liked something or the things that a user liked
- Tag content in a Content Management System (CMS)
- Store a GeoHash of a specific length for fast geographic lookup/filtering without expensive Geospatial operations
- Time-series data where all observations collected within a time-frame are referenced in a particular index
What If I Can’t Use Secondary Indexes?
Indexing is great, but if you want to use the Bitcask backend or if Secondary Indexes aren’t performant enough, there are alternatives.
A G-Set Term-Based Inverted Index has the following benefits over a Secondary Index:
- Better read performance at the sacrifice of some write performance
- Less resource intensive for the Riak cluster
- Excellent resistance to cluster partition since CRDTs have defined sibling merge behavior
- Can be implemented on any Riak backend including Bitcask, Memory, and of course LevelDB
- Tunable via read and write parameters to improve performance
- Ideal when the exact index term is known
Implementation of a G-Set Term-Based Inverted Index
A G-Set CRDT (Grow Only Set Convergent/Commutative Replicated Data Type) is a thin abstraction on the Set data type (available in most language standard libraries). It has a defined method for merging conflicting values (i.e. Riak siblings), namely a union of the two underlying Sets. In Riak, the G-Set becomes the value that we store in our Riak cluster in a bucket, and it holds a collection of keys to the objects we’re indexing (such as
thevegan3000). The key that references this G-Set is the term that we’re indexing,
administrator. The bucket containing the serialized G-Sets accepts Riak siblings (potentially conflicting values) which are resolved when the index is read. Resolving the indexes involves merging the sibling G-Sets which means that keys cannot be removed from this index, hence the name: “Grow Only”.
administrator G-Set Values prior to merging, represented by sibling values in Riak
administrator G-Set Value post merge, represented by a resolved value in Riak
Great! Show me the code!
As a demonstration, we integrated this logic into a branch of the Riak Ruby Client. As mentioned before, since a G-Set is actually a very simple construct and Riak siblings are perfect to support the convergent properties of CRDTs, the implementation of a G-Set Term-Based Inverted Index is nearly trivial.
There’s a basic interface that belongs to a Grow Only Set in addition to some basic JSON serialization facilities (not shown):
Next there’s the actual implementation of the Inverted Index. The index put operation simply creates a serialized G-Set with the single index value into Riak, likely creating a sibling in the process.
The index get operation retrieves the index value. If there are siblings, it resolves them by merging the underlying G-Sets, as described above, and writes the resolved record back into Riak.
With the modified Ruby client, adding a Term-Based Inverted Index is just as easy as a Secondary Index. Instead of using
_bin to indicate a string index and we’ll use
_inv for our Term-Based Inverted Index.
Binary Secondary Index:
zombie.indexes['zip_bin'] << data['ZipCode']
Term-Based Inverted Index:
zombie.indexes['zip_inv'] << data['ZipCode']
The downsides of G-Set Term-Based Inverted Indexes versus Secondary Indexes
- There is no way to remove keys from an index
- Storing a key/value pair with a Riak Secondary index takes about half the time as putting an object with a G-Set Term-Based Inverted Index because the G-Set index involves an additional Riak put operation for each index being added
- The Riak object which the index refers to has no knowledge of which indexes have been applied to it
- It is possible; however, to update the metadata for the Riak object when adding its key to the G-Set
- There is no option for searching on a range of values (e.g., all
See the Secondary Index documentation for more details.
The downsides of G-Set Term-Based Inverted Indexes versus Riak Search:
Riak Search is an alternative mechanism for searching for content when you don’t know which keys you want.
- No advanced searching: wildcards, boolean queries, range queries, grouping, etc
See the Riak Search documentation for more details.
Let’s see some graphs.
The graph below shows the average time to put an object with a single index and to retrieve a random index from the body of indexes that have already been written. The times include the client-side merging of index object siblings. It’s clear that although the put times for an object + G-Set Term-Based Inverted Index are roughly double than that of an object with a Secondary Index, the index retrieval times are less than half. This suggests that secondary indexes would be better for write-heavy loads but the G-Set Term-Based Inverted Indexes are much better where the ratio of reads is greater than the number of writes.
Over the length of the test, it is even clearer that G-Set Term-Based Inverted Indexes offer higher performance than Secondary Indexes when the workload of Riak skews toward reads. The use of G-Set Term-Based Inverted Indexes is very compelling even when you consider that the index merging is happening on the client-side and could be moved to the server for greater performance.
- Implement other CRDT Sets that support deletion
- Implement G-Set Term-Based Indexes as a Riak Core application so merges can run alongside the Riak cluster
- Implement strategies for handling large indexes such as term partitioning
July 11, 2013
To learn more about what’s new in Riak 1.4, sign up for our webcast on July 12th at 11am PT/2pm ET
With the introduction of Riak 1.4, Basho offers developers new ways to leverage secondary indexes (often referred to as 2i). This is a short review of what they are and what has been added.
Secondary indexes in Riak
Values in Riak are treated as opaque, although optional add-ons such as Riak Search and Yokozuna can index the contents.
With two of the supported storage backends (LevelDB and Memory), developers can add their own indexes for querying. These can be numeric or string values, matched as exact values or ranges, and can have as much or as little to do with the stored value as the developer wishes.
Primary key lookups will always be the fastest way to retrieve values from Riak, but 2i is a useful way to label and retrieve data.
What has changed with Riak 1.4?
Previously, results from 2i queries were presented as a comprehensive list of unordered keys. Depending on the size of the result set, this could be awkward (or impossible) for a client application to handle.
With 1.4, the following features have been added:
- Pagination and streaming are available on request.
- Results are now sorted: first by index value, then by keys.
- If requested as part of a range query, the matched index value will be returned alongside each key.
Here is an example of a range query via HTTP. Pagination is specified via
max_results=5 and the return of matched index values via
In this case we are querying a small Twitter firehose data set; each tweet was added to Riak with nested hashtag values as indexes. The query is designed to match hashtags in the range
ri (inclusive) to
continuation value is necessary to retrieve the next page of results, and as expected the results are sorted by index value and key.
Where to find more information?
Basho’s docs site has been updated for 1.4:
July 10, 2013
Today, Basho Technologies announced the public availability of Riak 1.4.
The release includes new features and updates in addition to a substantive set of addressed issues. These updates include improvements to Secondary Indexes, simplified cluster management through Riak Control, reduced object storage overhead, and progress reporting for Hinted Handoff. Riak 1.4 also sets the stage for Basho’s upcoming major release, Riak 2.0, planned for Fall 2013.
In addition to these features and capabilities, Riak 1.4 includes eventually consistent, distributed counter functionality. Riak’s first distributed data type provides conflict resolution after a network partition and continues to advance Basho’s position of leadership within the distributed systems space.
This release encompasses both Riak and Riak Enterprise, which includes the multi-datacenter replication capability used by an increasing number of enterprise customers to address their critical data needs.
A full list of the new features and updates available in the 1.4 release can be found on the Basho blog post, Basho Announces Availability of Riak 1.4.
July 10, 2013
We are excited to announce the launch of Riak 1.4. With this release, we have added in more functionality and addressed some common requests that we hear from customers. In addition, there are a few features available in technical preview that you can begin testing and will be fully rolled out in the 2.0 launch later this year.
The new features and updates in Riak 1.4 include:
- Secondary Indexing Improvements: Query results are now sorted and paginated, offering developers much richer semantics
- Introducing Counters in Riak: Counters, Riak’s first distributed data type, provide automatic conflict resolution after a network partition
- Simplified Cluster Management With Riak Control: New capabilities in Riak’s GUI-based administration tool improve the cluster management page for preparing and applying changes to the cluster
- Reduced Object Storage Overhead: Values and associated metadata are stored and transmitted using a more compact format, reducing disk and network overhead
- Handoff Progress Reporting: Makes operating the cluster, identifying and troubleshooting issues, and monitoring the cluster simpler
- Improved Backpressure: Riak responds with an overload message if a vnode has too many messages in queue
This 1.4 launch also adds quite a few performance enhancements to Riak Enterprise’s multi-datacenter replication that include:
- Replication in Riak 1.4 supports SSL, NAT, and full sync scheduling
- Availability of cascading real-time writes gives operators the choice as to whether or not all writes are replicated to all datacenters
- Optional use of Active Anti-Entropy during replication in Riak 1.4 to significantly decrease data transfer times is available in Technical Preview
These updates improve the performance of Riak and provide greater functionality and management for both clusters and multiple datacenters. You can download Riak 1.4 at docs.basho.com/riak/latest/downloads.
For a full list of what’s in Riak 1.4, check out our code at Github.com/basho or review the release notes. To learn even more, join our live webcast, “What’s New in Riak 1.4” on July 12th and look for a series of more detailed blog posts over the coming weeks.
We will also be launching Riak CS 1.4 shortly. Keep an eye on our blog for more information.
April 17, 2013
This post looks at five commonly asked questions about Riak. For more questions and answers, check out our Riak FAQ.
What hardware should I use with Riak?
Riak is designed to be run on commodity hardware and is run in production on a variety of different server types on both private and public infrastructure. However, there are several key considerations when choosing the right infrastructure for your Riak deployment.
RAM is one of the most important factors – RAM availability directly affects what Riak backend you should use (see question below), and is also required for complex MapReduce queries. In terms of disk space, Riak automatically replicates data according to a configurable n_val. A bucket-level property that defaults to 3, n_val determines how many copies of each object will be stored, and provides the inherent redundancy underlying Riak’s fault-tolerance and high availability. Your hardware choice should take into consideration how many objects you plan to store and the replication factor, however, Riak is designed for horizontal scale and lets you easily add capacity by joining additional nodes to your cluster. Additional factors that might affect choice of hardware include IO capacity, especially for heavy write loads, and intra-cluster bandwidth. For additional factors in capacity planning, check out our documentation on cluster capacity planning.
Riak is explicitly supported on several cloud infrastructure providers. Basho provides free Riak AMIs for use on AWS. We recommend using large, extra large, and cluster compute instance types on Amazon EC2 for optimal performance. Learn more in our documentation on performance tuning for AWS. Engine Yard provides hosted Riak solutions, and we also offer virtual machine images for the Microsoft VM Depot.
What backend is best for my application?
Riak offers several different storage backends to support use cases with different operational profiles. Bitcask and LevelDB are the most commonly used backends.
Bitcask was developed in-house at Basho to offer extremely fast read/write performance and high throughput. Bitcask is the default storage engine for Riak and ships with it. Bitcask uses an in-memory hash-table of all keys you write to Riak, which points directly to the on-disk location of the value. The direct lookup from memory means Bitcask never uses more than one disk seek to read data. Writes are also very fast with Bitcask’s write-once, append-only design. Bitcask also offers benefits like easier backups and fast crash recovery. The inherent limitation is that your system must have enough memory to contain your entire keyspace, with room for a few other operational components. However, unless you have an extremely large number of keys, Bitcask fits many datasets. Visit our documentation for more details on Bitcask, and use the Bitcask Capacity Calculator to assist you with sizing your cluster.
LevelDB is an open-source, on-disk key-value store from Google. Basho maintains a version of LevelDB tuned specifically for Riak. LevelDB doesn’t have Bitcask’s memory constraints around keyspace size, and thus is ideal for deployments with a very large number of keys. In addition to this advantage, LevelDB uses Google Snappy data compression, which provides particular efficiency for text data like raw text, Base64, JSON, HTML, etc. To use LevelDB with Riak, you must the change the storage backend variable in the app.config file. You can find more details on LevelDB here.
Riak also offers a Memory storage backend that does not persist data and is used simply for testing or small amounts of transient state. You can also run multiple backends within a single Riak instance, which is useful if you want to use different backends for different Riak buckets or use a different storage configuration for some buckets. For in-depth information on Riak’s storage backends, see our documentation on choosing a backend.
How do I model data using Riak’s key/value design?
Riak uses a key/value design to store data. Key/value pairs comprise objects, which are stored in buckets. Buckets are flat namespaces with some configurable properties, such as the replication factor. One frequent question we get is how to build applications using the key/value scheme. The unique needs of your application should be taken into account when structuring it, but here are some common approaches to typical use cases. Note that Riak is content-agnostic, so values can be any content type.
|Session||User/Session ID||Session Data|
|Content||Title, Integer||Document, Image, Post, Video, Text, JSON/HTML, etc.|
|Advertising||Campaign ID||Ad Content|
|Sensor||Date, Date/Time||Sensor Updates|
|User Data||Login, Email, UUID||User Attributes|
For more comprehensive information on building applications with Riak’s key/value design, view the use cases section of our documentation.
What other options, besides strict key/value access, are there for querying Riak?
Most operations done with Riak will be reading and writing key/value pairs to Riak. However, Riak exposes several other features for searching and accessing data: MapReduce, full-text search, and secondary indexing.
Riak also provides Riak Search, a full-text search engine that indexes documents on write and provides an easy, robust query language and SOLR-like API. Riak Search is ideal for indexing content like posts, user bios, articles, and other documents, as well as indexing JSON data. For more information, see the documentation on Riak Search.
Secondary indexing allows you to tag objects in Riak with one or more queryable values. These “tags” can then be queried by exact or range value for integers and strings. Secondary indexing is great for simple tagging and searching Riak objects for additional attributes. Check out more details here.
How does Riak differ from other databases?
We often get asked how Riak is different from other databases and other technologies. While an in-depth analysis is outside the scope of this post, the below should point you in the right direction.
Riak is often used by applications and companies with a primary background in relational databases, such as MySQL. Most people who move from a relational database to Riak cite a few reasons. For one, Riak’s masterless, fault-tolerant, read/write available design make it a better fit for data that must be highly available and resilient to failure scenarios. Second, Riak’s operational profile and use of consistent hashing means data is automatically redistributed as you add machines, avoiding hot spots in the database and manual resharding efforts. Riak is also chosen over relational databases for the multi-datacenter capabilities provided in Riak Enterprise. A more detailed look at the difference between Riak and traditional databases and how to make the switch can be found in this whitepaper, From Relational to Riak.
A more detailed look at the technical differences between Riak and other NoSQL databases can be found in the comparisons section of our documentation, which covers databases such as MongoDB, Couchbase, Neo4j, Cassandra, and others.
September 14, 2011
Developers building an application on Riak typically have a love/hate relationship with Riak’s simple key/value-based approach to storing data. It’s great that anyone can grok the basics (3 simple operations, get/put/delete) quickly. It’s convenient that you can store anything imaginable as an object’s value: an integer, a blob of JSON data, an image, an MP3. And the distributed, scalable, failure-tolerant properties that a key/value storage model enables can be a lifesaver depending on your use case.
But things get much less rosy when faced with the challenge of representing alternate keys, one-to-many relationships, or many-to-many relationships in Riak. Historically, Riak has shifted these responsibilities to the application developer. The developer is forced to either find a way to fit their data into a key/value model, or to adopt a polyglot storage strategy, maintaining data in one system and relationships in another.
This adds complexity and technical risk, as the developer is burdened with writing additional bookkeeping code and/or learning and maintaining multiple systems.
That’s why we’re so happy about Secondary Indexes. Secondary Indexes are the first step toward solving these challenges, lifting the burden from the backs of developers, and enabling more complex data modeling in Riak. And the best part is that it ships in our 1.0 release, just a few weeks from now.
How Do Secondary Indexes Work?
Update: Secondary Indexes use the new style HTTP API. See the Riak Wiki for more details.
From an application developer’s perspective, Secondary Indexes allow you to tag a Riak object with some index metadata, and later retrieve the object by querying the index, rather than the object’s primary key.
For example, let’s say you want to store a user object, accessible by username, twitter handle, or email address. You might pick the username as the primary key, while indexing the twitter handle and email address. Below is a curl command to accomplish this through the HTTP interface of a local Riak node:
curl -X POST
-H 'x-riak-index-twitter_bin: rustyio'
-H 'x-riak-index-email_bin: email@example.com'
-d '...user data...'
Previously, there was no simple way to access an object by anything other than the primary key, the username. The developer would be forced to “roll their own indexes.” With Secondary Indexes enabled, however, you can easily retrieve the data by querying the user’s twitter handle:
Query the twitter handle…
Or the user’s email address:
Query the email address…
You can change an object’s indexes by simply writing the object again with the updated index information. For example, to add an index on Github handle:
curl -X POST
-H 'x-riak-index-twitter_bin: rustyio'
-H 'x-riak-index-email_bin: firstname.lastname@example.org'
-H 'x-riak-index-github_bin: rustyio'
-d '...user data...'
That’s all there is to it, but that’s enough to represent a variety of different relationships within Riak.
Above is an example of assigning an alternate key to an object. But imagine that instead of a
twitter_bin field, our object had an
employer_bin field that matched the primary key for an object in our
employers bucket. We can now look up users by their employer.
Or imagine a
role_bin field that matched the primary key for an object in our
security_roles bucket. This allows us to look up all users that are assigned to a specific security role in the system.
Secondary Indexes maintains Riak’s distributed, scalable, and failure tolerant nature by avoiding the need for a pre-defined schema, which would be shared state. Indexes are declared on a per-object basis, and the index type (binary or integer) is determined by the field’s suffix.
Indexing is real-time and atomic; the results show up in queries immediately after the write operation completes, and all indexing occurs on the partition where the object lives, so the object and its indexes stay in sync. Indexes can be stored and queried via the HTTP interface or the Protocol Buffers interface. Additionally, index results can feed directly into a Map/Reduce operation. And our Enterprise customers will be happy to know that Secondary Indexing plays well with multi data center replication.
Indexes are declared as metadata, rather than an object’s value, in order to preserve Riak’s view that the value of your object is as an opaque document. An object can have an unlimited number of index fields of any size (dependent upon system resources, of course.) We have stress tested with 1,000 index fields, though we expect most applications won’t need nearly that many. Indexes do contribute to the base size of the object, and they also take up their own disk space, but the overhead for each additional index entry is minimal: the vector clock information (and other metadata) is stored in the object, not in the index entry. Additionally, the LevelDB backend (and, likely, most index-capable backends) support prefix-compression, further shrinking ndex size.
This initial release does have some important limitations. Only single index queries are supported, and only for exact matches or range queries. The result order is undefined, and pagination is not supported. While this offers less in the way of ad-hoc querying than other datastores, it is a solid 80% solution that allows us to focus future energy where users and customers need it most. (Trust me, we have many plans and prototypes of potential features. Building something is easy, building the right thing is harder.)
Behind The Scenes
What is happening behind the scenes? A lot, actually.
At write time, the system pulls the index fields from the incoming object, parses and validates the fields, updates the object with the newly parsed fields, and then continues with the write operation. The replicas of the object are sent to virtual nodes where the object and its indexes are persisted to disk.
At query time, the system first calculates what we call a “covering” set of partitions. The system looks at how many replicas of our data are stored and determines the minimum number of partitions that it must examine to retrieve a full set of results, accounting for any offline nodes. By default, Riak is configured to store 3 replicas of all objects, so the system can generate a full result set if it reads from one-third of the system’s partitions, as long as it chooses the right set of partitions. The query is then broadcast to the selected partitions, which read the index data, generate a list of keys, and send them back to the coordinating node.
Storing index data is very different from storing key/value data: in general, any database that stores indexes on a disk would prefer to be able to store the index in a contiguous block and in the desired
order–basically getting as near to the final result set as possible. This minimizes disk movement and other work during a query, and provides faster read operations. The challenge is that index values rarely enter the system in the right order, so the database must do some shuffling at write time. Most databases delay this shuffling, they write to disk in a slightly sub-optimal format, then go back and “fix things up” at a later point in time.
None of Riak’s existing key/value-oriented backends were a good fit for index data; they all focused on fast key/value access. During the development of Secondary Indexes we explored other options. Coincidentally, the Basho team had already begun work to adapt LevelDB–a low-level storage library from Google–as a storage engine for Riak KV. LevelDB stores data in a defined order, exactly what Secondary Indexes needed, and it is actually versatile enough to manage both the index data AND the object’s value. Plus, it is very RAM friendly. You can learn more about LevelDB from this page on Google Code.
Want To Know More?
You can grab a pre-release version of Riak Version 1.0 on the Basho downloads site to try the examples above. Remember to change the storage backend to
Finally keep an eye out for documentation that will land on the newly re-organized Basho Wiki within the next two weeks.