Tag Archives: key/value

Rubicon.IO Uses Riak to Provide Real-Time Threat Analysis

April 9, 2014

Rubicon.IO is a threat intelligence start-up that has developed purpose built technology that delivers on the promise of Scale, Speed, and Accuracy in using Big Data. Rubicon offers real-time analytic capabilities by scouring metadata from various sources: threat feeds, social media, SIEM data, and PCAPs. It uses a purpose built HPC engine that aggregates and humanizes geospatial, TECHINT, HUMINT, and OSINT data sources. Rubicon provides the necessary context for businesses to respond to attacks appropriately in real-time – all delivered using advanced visualizations via a multi-dimensional user interface. To provide this intelligence, Rubicon needs to find and store large amounts of data and access that data in near real-time. To do this, they use Riak.

Interface

(An example of the Rubicon User Interface)

When Rubicon was first starting, they planned to use CouchDB as the original Proof of Concept. However, as they started testing CouchDB, they found that it couldn’t handle the scale of data that they needed to store and access. Its document-only model also meant that they were constantly updating documents, rather than scaling out with immutable data. Wes Brown, Founder and CTO at Rubicon, knew they needed to find something else and saw this as the perfect opportunity to finally use Riak.

“I have tested all of the NoSQL database offerings in the past and Riak was the only one that lived up to its promise,” said Wes. “All of them fell apart at some point, except for Riak. Riak is a fantastic key/value store that provides the scale and low-latency Rubicon needs.”

As mentioned, Rubicon uses an immutable data model, meaning once data is put in, it does not change. This prevents the expensive cycle of reading and then modifying writes. In Riak, Rubicon stores a key for every atomic observation or “fact.” These facts have subfields that have normalized names. This makes it very simple for Rubicon to search and index facts as needed, to return any that are related. For example, they might search for anything pertaining to a certain IP address to provide additional context to clients regarding an attack. By providing this context, it allows their clients to better understand the attack, who’s behind it, where it came from, and what the appropriate response is. All of this information is provided in real-time and they use Infiniband to provide microsecond performance.

Visualization

(A portion of the visualization created from data collected in Riak)

Rubicon is currently about six months out from being in production with Riak. They are currently using the Riak 2.0 Technical Preview and will launch with Riak 2.0 GA. They are planning to launch with eight nodes and will scale up to 100 nodes to store their petabytes of data at low-latency.

“Riak has been a vital toolkit that helps us solve multiple problems, rather than just addressing one block problem,” says Wes. “By using Riak, we are able to take advantage of all the benefits and performance of a reliable key/value store, while continuing to build out our own functionality on top of it. We never need to worry about Riak, which invaluable for our business.”

For more information about Rubicon.IO, visit their site at www.rubicon.io

To see how other companies are using Riak, visit our Users Page.

Basho

“Throw Some Keys On It” Slides

March 10, 2014

Hector Castro is one of the Technical Evangelists here at Basho. Over the past few months, he has been presenting at various meetups and conferences about how to approach data modeling in Riak with his talk, “Throw Some Keys On It.” This talk sets the stage by discussing the more familiar relational database management systems (RDBMS) and some of its key features, including relationships, transactions, schemas, and Structured Query Language (SQL).

However, RDBMS also come with some tradeoffs and most applications don’t require all of its features. In addition, many applications now have availability, scalability, and latency requirements that negate many of the RDBMS benefits. That’s where key/value stores (like Riak) come in. Key/value data stores have a number of benefits compared to relational databases – including schemaless design, single-access reads, ability to handle write-heavy workloads, scalability, and a simple interface. While they are becoming increasingly more popular, many developers still think about data modeling from a relational mindset.

Using the Uber mobile application as an example, this talk helps developers think about decomposing more complex problems into smaller, simpler key/value pairs.

Videos of this talk will be available shortly but in the meantime, slides from this talk can be found below.

For more examples of key/value users, check out the Riak Users Page. For additional information about data modeling, visit our Docs or reach out to the Riak Mailing List.

Basho

Riak Data Migration

November 19, 2013

Implementing a database for a new project is a relatively straightforward process. However, when challenges of scalability are encountered in existing applications or workflows, it may be necessary to migrate data from an existing database solution to Riak. Our Professional Services team specializes in this type of engagement (Contact Us if you need help) and has put together a general set of considerations and guidelines when migrating to Riak.

When migrating data to Riak, we recommend a staged approach – migrating specific areas to Riak while continuing to run any existing data storage architecture. For each stage, pick a standalone logical unit of data, convert it to a storage format appropriate to Riak, consider how the data will be accessed, and write the migration scripts.

You should start with areas of data that have a one-to-one relationship, which makes them easier to model as a pair of keys and values (such as sessions, user preferences or profiles, logs, or straight content). This type of data can be easy to identify, as it usually will have a readily available key, such as a user id or session id.

Once you have isolated this data, you need to plan how it will be stored in Riak. In most cases, the keys will be dictated by the existing application data (the format of the session id or user id will be already be defined) and these objects can be reused as Riak object keys. The format of your object payload will also help dictate how it’s stored in Riak. Small binaries (PDFs or small images) can be stored as binary blobs, structured tables or other data can be stored as JSON or XML, and accompanying metadata can be stored as custom Riak headers.

Once the data model is defined, the act of migration is straightforward. Extract the relevant data from the existing system, create appropriate Riak objects, and upload the data. It’s hard to get much simpler than writing keys and values.

As you continue to migrate more difficult relational data, or need help during any step of the way, we have extensive documentation at docs.basho.com, the Riak users mailing list, and the Professional Services team is always available to answer questions or even help manage your transition.

Basho

Relational to Riak – Tradeoffs

November 18, 2013

This series of blog posts will discuss how Riak differs from traditional relational databases. For more information about any of the points discussed, download our technical overview, “From Relational to Riak.” The previous post in the series discussed High Availability and Cost of Scale.


Eventual Consistency

In order to provide high availability, which is a cornerstone of Riak’s value proposition, the database stores several copies of each key/value pair.

This availability requirement leads to a fundamental tradeoff: in order to continue to serve requests in the presence of failure, we do not force all data in the cluster to stay in sync. Riak will allow writes and reads no matter how many servers (and their stored replicas) are offline or otherwise unreachable.

(Incidentally, this lack of strong coordination has another consequence beyond high availability: Riak is a very, very fast database.)

Riak does provide both active and passive self-healing mechanisms to minimize the window of time during which two servers may have different versions of data.

The concept of eventual consistency may seem unfamiliar, but if you’ve ever implemented a cache or used DNS, those are common examples of the idea. In a large enough system, it’s effectively the default state of all data.

However, with the forthcoming release of Riak 2.0, operators will be able to designate selected pieces of data to require coordination and maintain strong consistency over high availability. Writing such data will be slower and subject to failure if too many servers are unreachable, but the overall robust architecture of Riak will still provide a fast, highly available solution.

Data Modeling

Riak stores data using a simple key/value model, which offers developers tremendous flexibility to define access models that suit their applications. It is also content-agnostic, so developers can store arbitrary data in any convenient format.

Instead of forcing application-specific data structures to be mapped into (and out of) a relational database, they can simply be serialized and dropped directly into Riak. For records that will be frequently updated, if some of the fields are immutable and some aren’t, we recommend keeping the immutable data in one key/value pair and the rest organized into a single or multiple objects based on update patterns.

Relational databases are ingrained habits for many of us, but moving beyond them can be liberating. Further information about data modeling, including sample configurations, are available on Use Cases section of the documentation.

Tradeoffs

One tradeoff with this simpler data model is that there is no SQL or SQL-like language with which to query the data.

To achieve optimal performance, it is advisable to take advantage of the flexibility of the key/value model to define simple retrieval patterns. In other words, determine the most useful queries and write the results of those queries as the data is being processed.

Because it is not always possible to know in advance what questions will need to be asked of your data, Riak offers added functionality on top of the key/value model. Tools such as Riak Search (a distributed, full-text search engine), Secondary Indexing (ability to tag objects with queryable metadata), and MapReduce (leveraged for aggregation tasks) are available to perform ad hoc queries as needed.

For many users, the tradeoffs of moving to Riak are worthwhile due to the overall benefits; however, it can be a bit of an adjustment. To see why others have chosen to switch to Riak from both relational systems and other NoSQL databases, check out our Users Page.

Basho

Riak is a Core Part of JBA's Behavioral Analysis Tools

October 10, 2013

JBA, based in Melbourne, Australia, is a customer-centric digital consultancy that specializes in developing customer understanding, providing experience optimization, behavioral targeting, and multichannel conversion. Their main customers are multi-channel retailers with eCommerce operations that want to gain deeper insights on their customers (such as reasons for shopping cart abandonment and retargeting). JBA uses Riak as a core part of this behavioral analysis and remarketing tool.

JBA started developing their behavioral analysis products 18 months ago and Riak has been in production since the beginning. When they first developed their flagship tool, they needed a key/value database to easily store all the user behavior data. On top of that, they needed a system that would scale easily, had Python integration for data analysis, would work well with other systems already in their stack, and was operationally simple for their small team. They assessed Riak, Cassandra, DynamoDB and MongoDB, but decided Riak was a better fit for their needs. Riak’s Python client library made it easy to work with, it’s built for scale, their operations team can easily manage the cluster using Riak’s command line tools, and they could even run it in AWS (as they were already using AWS heavily).

JBA currently has ten nodes in their cluster, all running on smaller Amazon instances. The ability to run on low-powered instances and simply scale up as needed versus having fewer high-powered instances has been vital to them. Since they primarily deal with online retailers, JBA can scale up to account for holiday sales cycles or new product releases and then scale back down. This flexibility helps to manage their costs.

They store over 10 million objects in Riak, with each object representing a customer state or a shopping cart. “We never have to worry about how much we’re storing because we can just scale out to cope with capacity issues,” said Matt Black, Senior Developer at JBA. “Riak gives us the ability to both store a lot of data but also look at objects in isolation. This is perfect for us because we rarely look at the whole data set in aggregate; we’re more interested in the state of individual users.”

JBA is also evaluating where else they can use Riak within the company, especially as they expand their behavioral analysis tools. They are firm believers in using the right tool for the job and currently also use MySQL for structured data, Hadoop for large scale MapReduce, and RabbitMQ for messaging. “Riak has done the job we set out to do. We’ve been very happy with it and we’re looking for more ways to integrate Riak into our business,” said JBA CTO, Andrew Fisher.

For more information about JBA, visit their site at www.jbadigital.com/

Basho

Top Five Questions About Riak CS

May 1, 2013

This post looks at five commonly asked questions about Riak CS – simple, available, open source storage built on top of Riak. For more information, please review our full documentation, or sign up for an intro to Riak CS webcast on Friday, May 10.

What is the relationship between Riak and Riak CS?

Riak CS is built on top of Riak, exposing higher-level storage functions including large object support, an S3-compatible API, multi-tenancy, and per-user storage and access statistics. Riak itself provides the replication, availability, fault-tolerance, and underlying storage functions for the Riak CS implementation. Riak and Riak CS should both be installed on every node in your cluster. While Riak and Riak CS could be run on separate virtual or physical nodes, running them on the same machine minimizes intra-cluster bandwidth usage and is the recommended approach. As with Riak, we advise a minimum 5-node cluster.

When objects are uploaded to Riak CS, the object is broken up into smaller chunks which are then streamed, stored, and replicated in the underlying cluster. A manifest is maintained for each object, that points to which blocks comprise the object, and is used to retrieve all blocks and present them to the client on read. In addition to running Riak and Riak CS on each node, Stanchion, a request serializer, must be installed on at least one node in the cluster. This ensures that global entities, such as users and buckets, are unique in the system.

What use cases does Riak CS support that Riak doesn’t?

Riak CS has several features that are not provided in the standalone Riak database. One of the most obvious differences is in the size of objects supported. Riak CS exposes large object support, and includes multi-part upload so you can upload objects as a series of parts. This allows you to upload single objects to the system into the terabyte range. In Riak, the data model is simply key/value; in Riak CS, the key/value model provides the underlying structure for higher-level storage semantics – users, buckets and objects. The Riak CS interface is an S3-compatible HTTP API, allowing you to use existing S3 libraries and tools. In contrast, Riak exposes an HTTP and protobufs API and offers many language-specific clients. Unlike Riak, Riak CS is multi-tenant, with the concept of “users” and per-user reporting on storage and access. This makes it a fit for both private cloud scenarios, with multiple internal users, or as a foundation for a public cloud storage offering.

How does multi-tenancy, authentication and reporting work?

Riak CS exposes an interface for user creation, disablement and credential management. Riak CS can be set so that only administrators can create new users. Administrators also have special privileges including being able to retrieve a list of all users in the system and query the user account information of any user. Once issued credentials, users are able to authenticate, create buckets, upload and download files, retrieve account information, obtain new credentials, or disable their account through the API. Riak CS supports the standard S3 authentication scheme, with support for header and query string authorization.

Riak CS exposes storage, usage and network statistics that support use cases like accounting, subscription, billing or multi-group utilization for public or private clouds. Riak CS will report information on how much storage a user is consuming and the network operations related to access. This data is exposed via an HTTP interface and can be queried on the default timespan “now” or as a range from start time through end time. Access statistics are reported as bytes in and bytes out for both object and bucket operations. Reporting of this information can be scheduled for a set interval or manually triggered.

What’s the difference between Riak CS and Riak CS Enterprise?

Riak CS Enterprise provides multi-datacenter replication on top of Riak CS. For multi-datacenter replication in Riak CS, global information for users, bucket information and manifests are streamed in real-time from a primary implementation to a secondary site so global state is maintained across locations. Objects can then be replicated in either full sync or real-time sync mode. The secondary site will replicate the object as in normal operations. Additional datacenters can be added in order to create availability zones or provide additional data redundancy and locality. Riak CS Enterprise can also be configured for bi-directional replication. Riak CS Enterprise also comes with 24/7, enterprise-level support. More information and pricing can be found here, and full technical information is available on our docs portal. Ready to get started? Sign up for a developer trial of Riak CS Enterprise.

What are your plans for integration of Riak CS with open source compute solutions?

Riak CS provides highly available, distributed storage, making it a natural fit for usage alongside compute solutions. We have partnered with Citrix to collaborate on the integration of Apache CloudStack and Riak CS to create a complete cloud software offering that combines compute and storage in an integrated platform. For more information on our partnership with CloudStack, check out this blog post with the latest update. API and authentication support for OpenStack is also in progress.

Ready to get started? You can download Riak CS here, and check out the Riak CS Fast Track for a hands-on getting started guide.

Top Five Questions About Riak

April 17, 2013

This post looks at five commonly asked questions about Riak. For more questions and answers, check out our Riak FAQ.

What hardware should I use with Riak?

Riak is designed to be run on commodity hardware and is run in production on a variety of different server types on both private and public infrastructure. However, there are several key considerations when choosing the right infrastructure for your Riak deployment.

RAM is one of the most important factors – RAM availability directly affects what Riak backend you should use (see question below), and is also required for complex MapReduce queries. In terms of disk space, Riak automatically replicates data according to a configurable n_val. A bucket-level property that defaults to 3, n_val determines how many copies of each object will be stored, and provides the inherent redundancy underlying Riak’s fault-tolerance and high availability. Your hardware choice should take into consideration how many objects you plan to store and the replication factor, however, Riak is designed for horizontal scale and lets you easily add capacity by joining additional nodes to your cluster. Additional factors that might affect choice of hardware include IO capacity, especially for heavy write loads, and intra-cluster bandwidth. For additional factors in capacity planning, check out our documentation on cluster capacity planning.

Riak is explicitly supported on several cloud infrastructure providers. Basho provides free Riak AMIs for use on AWS. We recommend using large, extra large, and cluster compute instance types on Amazon EC2 for optimal performance. Learn more in our documentation on performance tuning for AWS. Engine Yard provides hosted Riak solutions, and we also offer virtual machine images for the Microsoft VM Depot.

What backend is best for my application?

Riak offers several different storage backends to support use cases with different operational profiles. Bitcask and LevelDB are the most commonly used backends.

Bitcask was developed in-house at Basho to offer extremely fast read/write performance and high throughput. Bitcask is the default storage engine for Riak and ships with it. Bitcask uses an in-memory hash-table of all keys you write to Riak, which points directly to the on-disk location of the value. The direct lookup from memory means Bitcask never uses more than one disk seek to read data. Writes are also very fast with Bitcask’s write-once, append-only design. Bitcask also offers benefits like easier backups and fast crash recovery. The inherent limitation is that your system must have enough memory to contain your entire keyspace, with room for a few other operational components. However, unless you have an extremely large number of keys, Bitcask fits many datasets. Visit our documentation for more details on Bitcask, and use the Bitcask Capacity Calculator to assist you with sizing your cluster.

LevelDB is an open-source, on-disk key-value store from Google. Basho maintains a version of LevelDB tuned specifically for Riak. LevelDB doesn’t have Bitcask’s memory constraints around keyspace size, and thus is ideal for deployments with a very large number of keys. In addition to this advantage, LevelDB uses Google Snappy data compression, which provides particular efficiency for text data like raw text, Base64, JSON, HTML, etc. To use LevelDB with Riak, you must the change the storage backend variable in the app.config file. You can find more details on LevelDB here.

Riak also offers a Memory storage backend that does not persist data and is used simply for testing or small amounts of transient state. You can also run multiple backends within a single Riak instance, which is useful if you want to use different backends for different Riak buckets or use a different storage configuration for some buckets. For in-depth information on Riak’s storage backends, see our documentation on choosing a backend.

How do I model data using Riak’s key/value design?

Riak uses a key/value design to store data. Key/value pairs comprise objects, which are stored in buckets. Buckets are flat namespaces with some configurable properties, such as the replication factor. One frequent question we get is how to build applications using the key/value scheme. The unique needs of your application should be taken into account when structuring it, but here are some common approaches to typical use cases. Note that Riak is content-agnostic, so values can be any content type.

Data Type Key Value
Session User/Session ID Session Data
Content Title, Integer Document, Image, Post, Video, Text, JSON/HTML, etc.
Advertising Campaign ID Ad Content
Logs Date Log File
Sensor Date, Date/Time Sensor Updates
User Data Login, Email, UUID User Attributes

For more comprehensive information on building applications with Riak’s key/value design, view the use cases section of our documentation.

What other options, besides strict key/value access, are there for querying Riak?

Most operations done with Riak will be reading and writing key/value pairs to Riak. However, Riak exposes several other features for searching and accessing data: MapReduce, full-text search, and secondary indexing.

MapReduce provides non-primary key based querying that divides work across the Riak distributed database. It is useful for tasks such as filtering by tags, counting words, extracting links, analyzing log files, and aggregation tasks. Riak provides both Javascript and Erlang MapReduce support. Jobs written in Erlang are generally more performant. You can find more details about Riak MapReduce here.

Riak also provides Riak Search, a full-text search engine that indexes documents on write and provides an easy, robust query language and SOLR-like API. Riak Search is ideal for indexing content like posts, user bios, articles, and other documents, as well as indexing JSON data. For more information, see the documentation on Riak Search.

Secondary indexing allows you to tag objects in Riak with one or more queryable values. These “tags” can then be queried by exact or range value for integers and strings. Secondary indexing is great for simple tagging and searching Riak objects for additional attributes. Check out more details here.

How does Riak differ from other databases?

We often get asked how Riak is different from other databases and other technologies. While an in-depth analysis is outside the scope of this post, the below should point you in the right direction.

Riak is often used by applications and companies with a primary background in relational databases, such as MySQL. Most people who move from a relational database to Riak cite a few reasons. For one, Riak’s masterless, fault-tolerant, read/write available design make it a better fit for data that must be highly available and resilient to failure scenarios. Second, Riak’s operational profile and use of consistent hashing means data is automatically redistributed as you add machines, avoiding hot spots in the database and manual resharding efforts. Riak is also chosen over relational databases for the multi-datacenter capabilities provided in Riak Enterprise. A more detailed look at the difference between Riak and traditional databases and how to make the switch can be found in this whitepaper, From Relational to Riak.

A more detailed look at the technical differences between Riak and other NoSQL databases can be found in the comparisons section of our documentation, which covers databases such as MongoDB, Couchbase, Neo4j, Cassandra, and others.

Ready to get started? You can download Riak here. For more in-depth information about Riak, we also offer Riak Workshops in New York and San Francisco. Learn more here.

Basho

Relational to Riak, Part 2- Operational Cost of Scaling

January 14, 2013

This is the second in a series of blog posts that discusses a high-level overview of the benefits and tradeoffs of Riak versus traditional relational databases. If this is relevant to your projects or applications, register for our “From Relational to Riak” webcast on Thursday, January 24.

One critical factor in deciding which database to use is its operational profile. Many customers today are dealing with rapid data growth, intense peak loads and the imperative to maintain economies of scale across a large platform. For these customers, how the database scales up and what impact that has on operations is a huge factor in business and technical decisions around what technology to use.

The cost of scale is one reason why many of our users and customers have picked Riak over a traditional relational system. From experience, users have discovered that scaling a relational system can be expensive, error-prone and lead to significant and disruptive operations projects. In this blog, we’ll take a look at how a relational database’s sharding approach differs from Riak’s consistent hashing approach and what that means for you as an operator.

Historically, relational databases were commonly found running in production on a single server. If capacity and availability needs require more than a single machine, relational databases address scale using a technique called sharding. Sharding breaks data into logical parts (such as alphabetically, numerically or by geographic region) that can be distributed across multiple machines. A simplified example is below.

Sharding

This approach can be problematic for several reasons. First, writing and maintaining sharding logic increases the overhead of operating and developing an application on the database. Significant growth of data or traffic typically means significant, often manual, resharding projects. Determining how to intelligently split the dataset without negatively impacting performance, operations, and development presents a substantial challenge– especially when dealing with “big data”, rapid scale, or peak loads. Further, rapidly growing applications frequently outpace an existing sharding scheme. When the data in a shard grows too large, the shard must again be split. While several “auto”-sharding technologies have emerged in recent years, these methods are often imprecise and manual intervention is standard practice. Finally, sharding can often lead to “hot spots” in the database – physical machines responsible for storing and serving a disproportionately high amount of both data and requests – which can lead to unpredictable latency and degraded performance.

To avoid sharding (and the associated expenses), data in Riak is distributed across nodes using consistent hashing. Consistent hashing ensures data is evenly distributed around the cluster and new nodes can be added with automatic, minimal reshuffling of data. This significantly decreases risky “hot spots” in the database and lowers the operational burden of scaling.

How does consistent hashing work? Riak stores data using a simple key/value scheme. These keys and values are stored in a namespace called a bucket. When you add new key/value pairs to a bucket in Riak, each object’s bucket and key combination is hashed. The resulting value maps onto a 160-bit integer space. You can think of this integer space as a ring used to figure out what data to put on which physical machines.

How? Riak divides the integer space into equally-sized partitions (default is 64). Each partition owns the given range of values on the ring, and is responsible for all buckets and keys that, when hashed, fall into that range. Each partition is managed by a process called a virtual node (or “vnode”). Physical machines in the cluster evenly divide responsibility for vnodes. Each physical machine thus becomes responsible for all keys represented by its vnodes.

Consistent Hashing

When nodes are added or removed, data is rebalanced automatically without any operator intervention. New machines assume ownership of some of the partitions and existing machines hand off relevant partitions and associated data until data ownership is equal amongst nodes. Riak also has an elegant approach to making cluster changes such as adding or removing nodes, allowing you to stage up the changes, view the impact on the cluster, and then choose to commit or abort the changes. Developers and operators don’t have to deal with the underlying complexity of what data lives where as all nodes can serve and route requests. By eliminating the manual requirements of sharding and much of the potential for “hot spots,” Riak provides a much simpler operational scenario for many users that lets them add and remove machines as needed, no matter how much they grow.

Want more info on relational vs Riak approaches? Sign up for the webcast here or read our whitepaper on moving from relational to Riak.

Basho

Building Apps on Riak – Content, Sessions, User Data, Ads and Other Use Cases

October 30, 2012

How do I build my application on Riak? This is one of the first questions we get from people new to Riak. Whether you are new to the key/value model, switching from a relational database, or building a new application on Riak, we’ve got a new section of the docs to help you get started.

Learn about data models for common application types – including high read/write use cases (session storage, ad platforms, log and sensor data) and apps that require relationship modeling (content-serving applications and user accounts, settings, and events). While not meant to be a prescriptive guide, this section walks through common simple and complex approaches to building apps using Riak’s key/value design and features such as MapReduce, search and secondary indexing. You can also check out the stories of users like Voxer, Clipboard and Yammer to learn more about how they built their apps on Riak.

This is just the start of a bigger effort to help users better understand how to build amazing applications on Riak. We will be adding code examples, other application types, additional considerations, and more stories from the user community. We’d love to add your contributions, so please submit a pull request or file an issue on our Github repo if you have anything to add or anything you’d like to know more about.

Basho Team

Secondary Indexes in Riak

September 14, 2011

Developers building an application on Riak typically have a love/hate relationship with Riak’s simple key/value-based approach to storing data. It’s great that anyone can grok the basics (3 simple operations, get/put/delete) quickly. It’s convenient that you can store anything imaginable as an object’s value: an integer, a blob of JSON data, an image, an MP3. And the distributed, scalable, failure-tolerant properties that a key/value storage model enables can be a lifesaver depending on your use case.

But things get much less rosy when faced with the challenge of representing alternate keys, one-to-many relationships, or many-to-many relationships in Riak. Historically, Riak has shifted these responsibilities to the application developer. The developer is forced to either find a way to fit their data into a key/value model, or to adopt a polyglot storage strategy, maintaining data in one system and relationships in another.

This adds complexity and technical risk, as the developer is burdened with writing additional bookkeeping code and/or learning and maintaining multiple systems.

That’s why we’re so happy about Secondary Indexes. Secondary Indexes are the first step toward solving these challenges, lifting the burden from the backs of developers, and enabling more complex data modeling in Riak. And the best part is that it ships in our 1.0 release, just a few weeks from now.

How Do Secondary Indexes Work?

Update: Secondary Indexes use the new style HTTP API. See the Riak Wiki for more details.

From an application developer’s perspective, Secondary Indexes allow you to tag a Riak object with some index metadata, and later retrieve the object by querying the index, rather than the object’s primary key.

For example, let’s say you want to store a user object, accessible by username, twitter handle, or email address. You might pick the username as the primary key, while indexing the twitter handle and email address. Below is a curl command to accomplish this through the HTTP interface of a local Riak node:

bash
curl -X POST
-H 'x-riak-index-twitter_bin: rustyio'
-H 'x-riak-index-email_bin: rusty@basho.com'
-d '...user data...'

http://localhost:8098/buckets/users/keys/rustyk

Previously, there was no simple way to access an object by anything other than the primary key, the username. The developer would be forced to “roll their own indexes.” With Secondary Indexes enabled, however, you can easily retrieve the data by querying the user’s twitter handle:

Query the twitter handle…

curl localhost:8098/buckets/users/index/twitter_bin/rustyio

Response…

{“keys”:["rustyk"]}

Or the user’s email address:

Query the email address…

curl localhost:8098/buckets/users/index/email_bin/rusty@basho.com

Response…

{“keys”:["rustyk"]}

You can change an object’s indexes by simply writing the object again with the updated index information. For example, to add an index on Github handle:

bash
curl -X POST
-H 'x-riak-index-twitter_bin: rustyio'
-H 'x-riak-index-email_bin: rusty@basho.com'
-H 'x-riak-index-github_bin: rustyio'
-d '...user data...'

http://localhost:8098/buckets/users/keys/rustyk

That’s all there is to it, but that’s enough to represent a variety of different relationships within Riak.

Above is an example of assigning an alternate key to an object. But imagine that instead of a twitter_bin field, our object had an employer_bin field that matched the primary key for an object in our employers bucket. We can now look up users by their employer.

Or imagine a role_bin field that matched the primary key for an object in our security_roles bucket. This allows us to look up all users that are assigned to a specific security role in the system.

Design Decisions

Secondary Indexes maintains Riak’s distributed, scalable, and failure tolerant nature by avoiding the need for a pre-defined schema, which would be shared state. Indexes are declared on a per-object basis, and the index type (binary or integer) is determined by the field’s suffix.

Indexing is real-time and atomic; the results show up in queries immediately after the write operation completes, and all indexing occurs on the partition where the object lives, so the object and its indexes stay in sync. Indexes can be stored and queried via the HTTP interface or the Protocol Buffers interface. Additionally, index results can feed directly into a Map/Reduce operation. And our Enterprise customers will be happy to know that Secondary Indexing plays well with multi data center replication.

Indexes are declared as metadata, rather than an object’s value, in order to preserve Riak’s view that the value of your object is as an opaque document. An object can have an unlimited number of index fields of any size (dependent upon system resources, of course.) We have stress tested with 1,000 index fields, though we expect most applications won’t need nearly that many. Indexes do contribute to the base size of the object, and they also take up their own disk space, but the overhead for each additional index entry is minimal: the vector clock information (and other metadata) is stored in the object, not in the index entry. Additionally, the LevelDB backend (and, likely, most index-capable backends) support prefix-compression, further shrinking ndex size.

This initial release does have some important limitations. Only single index queries are supported, and only for exact matches or range queries. The result order is undefined, and pagination is not supported. While this offers less in the way of ad-hoc querying than other datastores, it is a solid 80% solution that allows us to focus future energy where users and customers need it most. (Trust me, we have many plans and prototypes of potential features. Building something is easy, building the right thing is harder.)

Behind The Scenes

What is happening behind the scenes? A lot, actually.

At write time, the system pulls the index fields from the incoming object, parses and validates the fields, updates the object with the newly parsed fields, and then continues with the write operation. The replicas of the object are sent to virtual nodes where the object and its indexes are persisted to disk.

At query time, the system first calculates what we call a “covering” set of partitions. The system looks at how many replicas of our data are stored and determines the minimum number of partitions that it must examine to retrieve a full set of results, accounting for any offline nodes. By default, Riak is configured to store 3 replicas of all objects, so the system can generate a full result set if it reads from one-third of the system’s partitions, as long as it chooses the right set of partitions. The query is then broadcast to the selected partitions, which read the index data, generate a list of keys, and send them back to the coordinating node.

Storing index data is very different from storing key/value data: in general, any database that stores indexes on a disk would prefer to be able to store the index in a contiguous block and in the desired
order–basically getting as near to the final result set as possible. This minimizes disk movement and other work during a query, and provides faster read operations. The challenge is that index values rarely enter the system in the right order, so the database must do some shuffling at write time. Most databases delay this shuffling, they write to disk in a slightly sub-optimal format, then go back and “fix things up” at a later point in time.

None of Riak’s existing key/value-oriented backends were a good fit for index data; they all focused on fast key/value access. During the development of Secondary Indexes we explored other options. Coincidentally, the Basho team had already begun work to adapt LevelDB–a low-level storage library from Google–as a storage engine for Riak KV. LevelDB stores data in a defined order, exactly what Secondary Indexes needed, and it is actually versatile enough to manage both the index data AND the object’s value. Plus, it is very RAM friendly. You can learn more about LevelDB from this page on Google Code.

Want To Know More?

If you want to learn more about Secondary Indexes, you can read the slides from my talk at OSCON Data 2011: Querying Riak Just Got Easier. Alternatively, you can watch the video.

You can grab a pre-release version of Riak Version 1.0 on the Basho downloads site to try the examples above. Remember to change the storage backend to riak_kv_eleveldb_backend!

Finally keep an eye out for documentation that will land on the newly re-organized Basho Wiki within the next two weeks.

Rusty