October 10, 2010

Dedicated Riak users who were who hanging out in the IRC channel late Friday may have seen the following message from our bot that reports changes to the canonical Riak Repo on Bitbucket:

dizzyd Added tag riak-0.13.0 for changeset 957b3d601bde

That’s right! We tagged the 0.13 release of Riak this past Friday and, after of a weekend away from our laptops, are ready to announce it officially. As you’re about to learn, this was a banner release for a whole host of reasons. Here is a rundown of the noteworthy changes and enhancements in this release:

Riak Search

As most people know, we have been hard at work on Riak Search for several months and, after releasing it to a handful of users as part of a limited beta, we are finally ready to release it into the wild!

Riak Search is a distributed, easily-scalable, failure-tolerant, real-time, full-text search engine built around Riak Core and tightly integrated with Riak’s key/value layer.

At a very high level, Search works like this: when a bucket in Riak has been enabled for Search integration (by installing the Search pre-commit hook), any objects stored in that bucket are also indexed seamlessly in Riak Search. You can then find and retrieve your Riak objects using the objects’ values. The Riak Client API can then be used to perform Search queries that return a list of bucket/key pairs matching the query. Alternatively, the query results can be used as the input to a Riak MapReduce operation. Currently the PHP, Python, Ruby, and Erlang APIs support integration with Riak Search.

There is obviously much more to be written about Riak Search, and a simple blurb in this blog post won’t do it justice. It’s for this reason that we are dedicating an entire blog post to it. You can expect to see this next week. In the mean time, you can go download the code and get more usage information on the Riak Search Section.

riak_kv and riak_core

In addition to focusing on previously-unreleased bits like Search, we also strengthened the code and functionality of riak_kv, the layer that handles the key/value operations, and riak_core, the component of Riak that provides all the services necessary to write a modern, well-behaved distributed application.

Firstly, the structure of the Riak source repos has been changed such that there is a single Erlang app per repo. This permits third parties to use riak_core (the abstracted Dynamo logic) in other applications. As you can imagine, this opens up a world of possibilities and we are looking forward to seeing where this code ends up being implemented. (Check out this blog post for a primer on Riak Core.)

The consistent hash ring balancing logic has been improved to enable large clusters to converge more quickly on a common ring state. Related to this, further work has been done to improve Riak’s performance and robustness when dealing with large data sets and failure scenarios.

Also, the Riak Command Line Tools were expanded with two new commands: “ringready” and “wait-for-service.” These were put in place to help anyone administering a Riak installation to script detection of cluster convergence on a given ring and wait for components like riak_kv to become available, respectively. You can read up on all your favorite Riak Command Line Tools here.

MapReduce

Enhancing Riak’s MapReduce functionality is something on which we’ve been focusing intensely for the past several releases, and this one was no different.

We reworked how Riak’s MapReduce handled overload scenarios, especially in the event that the number of available JavaScript VMs were low. As a result of these changes, Riak now does a much better job of tracking which Javascript VMs are busy and attempts to spread the load more evenly over all VMs.

In addition to this, the caching layer for JavaScript MapReduce has been completely re-implemented. This results in performance gains when repeating the same MapReduce jobs. Specifically, this work includes a new in-memory vnode LRU cache solely for map operations. The size of the cache is now configurable (via the ‘vnode_cache_entries’ entry in the riak_kv section of app.config) and defaults to 1000 objects.

And, we improved MapReduce job input handling. In short what this means is that you will see lower resource consumption and more stable run times when the input producer exceeds the cluster’s ability to process new MapReduce inputs.

It’s also worth mentioning that there is more MapReduce goodness in the pipeline, both code and non-code. Stay tuned, because it’s only getting better!

Bitcask

Bitcask, the default storage backed for Riak, also got some huge enhancements in the 0.13 release. The most noticeable and significant improvement is that Bitcask now uses 40% less memory per key entry. The net effect is that you are required to have less RAM available in your cluster (as Bitcask stores all key data in memory). And, for those of you working with large data sets, we did some work to make starting up Bitcask significantly faster.

Listing keys, something the community has been asking for us to make more efficient, got a nice speedup when using Bitcask (which should make some of your map/reduce queries a lot snappier). This work is ongoing, too, so look for this process to get even faster as we make additional incremental improvements. Also of note, Bitcask will now reclaim memory of expired key entries.

Conclusion

Riak 0.13 is our best release yet and we are more confident than ever that it’s the best database out there for your production needs (assuming it suits your use case, of course). In addition to the changes we highlighted above, you should also take a moment to check out the complete release notes.

After you’re done with those, drop everything and do the following:

As always, let us know if you have any questions and issue, and thanks for being a part of Riak!

The Basho Team