April 23, 2010

give the people what they want

We’ve received a lot of feedback in the past few months about the ways that Riak already serves people well, and the ways that they wish it could do more for them. Our latest release is an example of our response to that feedback.

Protocol Buffers

Riak has always been accessible via a clean and easy-to-use HTTP interface. We made that choice because HTTP is unquestionably the most well-understood and well-deployed protocol for data transfer. This has paid off well by making it simple for people to use many languages to interact with Riak, to get good caching behavior, and so on. However, that interface is not optimized for maximum throughput. Each request needs to parse several unknown-length headers, for example, which imposes a bit of load when you’re pointing a firehose of data into your cluster.

For those who would rather give up some of the niceties of HTTP to get a bit more speed, we have added a new client-facing interface to Riak. That interface uses the ”protocol buffers“ encoding scheme originally created by Google. We are beginning to roll out some client libraries with support for that new interface, starting with Python and Erlang but soon to encompass several other languages. You can expect them to trickle out over the next couple of weeks. Initial tests show a nice multiple of increased throughput on some workloads when switching to the new interface. We are likely to release some benchmarks to demonstrate this sometime soon. Give it a spin and let us know what you think.

Commit Hooks

A number of users (and a few key potential customers) have asked us how to either verify some aspects of their data (schemas, etc) on the way in to Riak, or else how to take some action (on a secondary object or otherwise) as a result of having stored it. Basically, people seem to want stored procedures.

Okay, you can have them.

Much like with our map/reduce functionality, your own functions can be expressed in either Erlang or JavaScript. As with any database’s stored procedures you should make sure to make them as simple as possible or else you might place an undue load on the cluster when trying to perform a lot of writes.

Faster Key Listings

Listing of all of the keys in a Riak bucket is fundamentally a bit more of a pain than any of the per document operations as it has to deal with and coordinate responses from many nodes. However, it doesn’t need to be all that bad.

The behavior of list_keys in Riak 0.10 is much faster than in previous releases, due both to more efficient tracking of vnode coverage and also to a much faster bloom filter. The vnode coverage aspect also makes it much more tolerant of node outages than before.

If you do use bucket key listings in your application, you should always do so in streaming mode (“keys=stream” query param if via HTTP) as doing otherwise necessitates building the entire list in memory before sending it to the client.

Cleanliness and Modularity

A lot of other work went into this release as well. The internal tracking of dependencies is much cleaner, for those of you building from source (instead of just grabbing a pre-built release). We have also broken apart the core Erlang application into two pieces. There will be more written on the reasons and benefits of that later, but for now the impact is that you probably need to make some minor changes to your configuration files when you upgrade.

All in all, we’re excited about this release and hope that you enjoy using it.

- Justin