Tag Archives: Riak Data Types

Basho Events in April

April 7, 2014

This month, it’s all about developer conferences – both local and international. If you’re in the area, come and say hi. We always love to chat Riak and can answer any questions you may have. If you aren’t going to be at any of these events, check out the Riak Mailing List for questions or Contact Us to help get started with Riak.

Here is a look at where we’ll be this month.

PyCon 2014: On the Wednesday (April 9th) of PyCon 2014, Basho Technical Evangelist, Tom Santero, will host a free workshop on “Building Applications on Riak,” starting at 1:30pm. PyCon takes place April 9-17 in Montreal, Canada.

ChefConf 2014: Basho will be attending ChefConf and our technical evangelists will be available to answer any Riak questions you may have. ChefConf 2014 takes place April 15-17 in San Francisco, CA.

CRAFT Conference: Basho is a proud sponsor of CRAFT Conference, which takes place April 23-25 in Budapest, Hungary. We will have a booth setup so stop by to grab some great swag and chat Riak.

NoSQL Matters: At NoSQL Matters: Cologne, Basho Technical Evangelist, Joel Jacobson, will present on “CRDTs in Riak,” one of the new features available with Riak 2.0. His talk will take place at 12:30pm on April 30th. NoSQL Matters: Cologne takes place April 29-30 in Cologne, Germany.

For a full list of where we’ll be, both in April and beyond, visit our Events Page.

Basho

Entropy in Riak

March 26, 2014

Riak’s overarching design goal is simple: be maximally available. If your data center is on fire, Riak will be the last part of your stack to fail (and hopefully, you’ve purchased an enterprise license, so there’s another cluster in another data center ready to go at all times).

In order to make sure your data can survive server failures, Riak retains multiple copies (replicas) and allows lock-free, uncoordinated updates.

This then open ups the possibility that data will be out of sync across a cluster. Riak manages this issue in three distinct stages: entropy detection, correction, and conflict resolution.

Entropy Detection

Among the oldest and simplest tools in Riak is Read Repair, which, as its name implies, is triggered when a read request is received. If the server coordinating the operation notices that the servers responsible for the key do not agree on its value, correction is required.

A more recent feature in Riak is Active Anti-Entropy (often shortened to AAE). Effectively, this is the proactive version of read repair and runs in the background. Riak maintains hash trees to monitor for inconsistent data between servers; when divergent values are detected, correction is mandated.

Correction

As discussed in the blog post, Clocks Are Bad, Or, Welcome to the Wonderful World of Distributed Systems, automatically determining the “correct” value in the event of a conflict is not simple, and often not possible at the database layer.

Using contextual metadata called vector clocks, Riak will attempt to determine whether one of the discovered values is derived from the other. In that case, it can safely choose the most recent value. This value is written to all servers that have a copy of the data and conflict resolution is not required.

If Riak can’t verify such a causal relationship, things get more difficult.

Riak’s default behavior, is to fall back to server clocks to determine a winner. This can lead to unexpected results if concurrent updates are received but, on the positive side, conflict resolution will not be required.

If Riak is instead configured with allow_mult=true to protect data integrity, even when independent writes are received, Riak will write both values to the servers as siblings for later conflict resolution.

Conflict Resolution

Conflict resolution means that Riak detects a conflict, can’t resolve it intelligently, and isn’t instructed to resolve it otherwise.

Next time the application attempts to read such a value, instead of receiving one answer, it’s going to receive (at least) two. It is now the application’s responsibility to deal with the conflict and provide a new value back to Riak for future reads.

This can be trivial (pick one value), obvious (merge all values), or tricky (apply business logic and come back with something different).

With Riak 2.0, Basho is introducing Riak Data Types, which are designed to handle conflict resolution automatically. If your data can be formulated as a set or a map (not terribly dissimilar from JSON), Riak can process and resolve the siblings for you when a read request is received.

Why?

Many databases, particularly distributed ones, are effectively non-deterministic in the presence of concurrent writes. If they try to enforce consistency on writes, they sacrifice availability and often data integrity. If they don’t enforce consistency, they may rely on server (or worse, client) clocks to pick a winner, if they even have a strategy at all.

Riak is unique in encouraging developers to think about conflict resolution. Why? Because, for large distributed systems, network and server failures are a fact of life. For very large distributed systems, data duplication and inconsistency is unavoidable. Since Riak is designed for scale, it’s better to provide a structure for proper resolution than to pretend conflicts don’t exist.

John Daily

Basho at NYLUG

February 5, 2014

At the recent meetup for the New York Linux Users Group (NYLUG), Basho Technical Evangelist, Tom Santero, presented “An Introduction to Basho’s Riak.” In this talk, Tom explains how Riak addresses the challenges of concurrent data storage at scale. He discusses the various design decisions, tradeoffs made, and theories at work within Riak. He also provides guidance as to how you might deploy Riak in production and why.

In addition to introducing the basics of Riak and its key/value data model, Tom presents some of the exciting features being introduced with Riak 2.0. Riak Data Types adds counters, sets, and maps to Riak – allowing for better conflict resolution. They enable developers to spend less time thinking about the complexities of vector clocks and sibling resolution and, instead, focusing on using familiar, distributed data types to support their applications’ data access patterns.

You can watch Tom’s full talk below:

For more information about Riak and how it differs from traditional databases, check out the whitepaper, “From Relational to Riak.”

To see where Basho will be presenting next, visit the Events Page.

Basho

Hangouts with Basho

January 29, 2014

On Fridays, Basho hosts a Hangout to discuss various topics related to Riak and distributed systems. While Basho evangelists and engineers lead these live Hangouts, they also bring in experts from various other companies, including Kyle Kingsbury (Fatcual), Jeremiah Peschka (Brent Ozar Unlimited), and Stuart Halloway (Datomic).

If you haven’t attended a Hangout, we have recorded them all and they are available on the Basho Technologies Youtube Channel. You can also watch each below.

Data Types and Search in Riak 2.0

Featuring Mark Phillips (Director of Community, Basho), Sean Cribbs (Engineer, Basho), Brett Hazen (Engineer, Basho), and Luke Bakken (Client Services Engineer, Basho)

Bucket Types and Configuration

Featuring Tom Santero (Technical Evangelist, Basho), Joe DeVivo (Engineer, Basho), and Jordan West (Engineer, Basho)

Riak 2.0: Security and Conflict Resolution

Featuring John Daily (Technical Evangelist, Basho), Andrew Thompson (Engineer, Basho), Justin Sheehy (CTO, Basho), and Kyle Kingsbury (Factual)

Fun with Java and C Clients

Featuring Seth Thomas (Technical Evangelist, Basho), Brett Hazen (Engineer, Basho), and Brian Roach (Engineer, Basho)

Property Based Testing

Featuring Tom Santero (Technical Evangelist, Basho) and Reid Draper (Engineer, Basho)

Datomic and Riak

Featuring Hector Castro (Technical Evangelist, Basho), Dmitri Zagidulin (Professional Services, Basho), and Stuart Halloway (Datomic)

CorrugatedIron

Featuring John Daily (Technical Evangelist, Basho), David Rusek (Engineer, Basho), and Jeremiah Peschka (Brent Ozar Unlimited)

A Look Back

Featuring John Daily (Technical Evangelist, Basho), Hector Castro (Technical Evangelist, Basho), Andy Gross (Chief Architect, Basho), and Mark Phillips (Director of Community, Basho)

Hangouts take place on Fridays at 11am PT/2pm ET. If you have any topics you’d like to see featured, let us know on the Riak Mailing List.

Basho

RICON West Videos: Riak Data Types

December 12, 2013

At RICON West this year, we announced the Technical Preview of Riak 2.0. Before the full release (which will be available early next year), we are encouraging users to download the preview and start testing some of the exciting new features.

At RICON, we had many of the engineers who worked on these new features present their work. One feature that we’re particularly excited about is the addition of Riak Data Types. Riak 2.0 builds on eventually consistent counters (added with Riak 1.4) with the addition of maps and sets. These Riak Data Types simplify application development without sacrificing Riak’s availability and partition tolerance characteristics.

In “CRDTs: An Update (or Maybe Just a PUT),” Basho engineer, Sam Elliott, presents on the work being done with Riak Data Types. Sam and a few other engineers at Basho have been integrating cutting-edge research on data types (known as CRDTs), pioneered by INRIA, to create Riak Data Types. Sam talks about the latest developments on CRDTs and walks developers through how to use them in their own applications.

In addition to Sam’s talk, we also had a talk from Jeremy Ong on “CRDTs in Production.” His talk provides real world solutions to leveraging CRDT concepts for an industrial application via case study. He also offers some suggestions on how to tackle data operations that can’t always commute. You can watch his full talk below.

For more information about Riak Data Types, check out this overview on Github.

To watch all of the sessions from RICON West 2013, visit the Basho Technologies Youtube Channel.

Basho

Clocks Are Bad, Or, Welcome to the Wonderful World of Distributed Systems

November 12, 2013

A recent email thread on the Riak users mailing list highlighted one of the key weaknesses of distributed systems: clock consistency.

The first email:

Occasionally, riak seems to not store an object I try to save. I have run tcpdump on the node receiving the request to ensure it is receiving the http packets with the correct JSON from the client. When the issue occurs the node is in fact receiving the request with the correct JSON.

Riak is designed to accommodate server and network failures without ever losing committed writes, so this led to a quick response from Basho’s engineers.

After some discussion, a vital piece of information was revealed:

One other thing that might be worth mentioning here is the writes I’m mentioning are actually updates to existing objects. The object exists, an attempt to write an update for the object appears to be received by a node, but the object maintains it’s original value.

Riak was dropping updates rather than writes, which is a horse of a different color. To see why updates are much more problematic for any distributed database, read on.

Concurrent Updates

In a database that runs on a single server, setting aside any complications introduced by transactions or locks, the second of two updates to the same record will overwrite the first. Last write wins.

With Riak’s simplest conflict resolution behavior, the second of two updates to the same object may or may not overwrite the first, even if those two updates are spaced far apart. Last write wins, except when it doesn’t, but even then it does.

Confused yet?

The problem is simple: there is no reliable definition of “last write”; because system clocks across multiple servers are going to drift.

On a single server, there’s one canonical clock, regardless of accuracy. The system can always tell which write occurred in which order (assuming that the clock is always increasing; setting a clock backwards can cause all sorts of bad behavior).

So, back to our original problem with lost updates:

The nodes were a bit out of synch (up to 30 seconds… looking into why ntp wasn’t working!). So far it appears this was the issue.

If two updates to the same object occur within 30 seconds in such an environment, the end result is unpredictable.

Taming the Beast

The conclusion drawn from the discussion was to implement (and, hopefully, to monitor) time synchronization. This is a step in the right direction, and one that every distributed system should implement, but there are more powerful and instructive lessons to impart.

Background Reading

Some of this discussion requires awareness of siblings, vector clocks, and related arcana. If you wish to read more about these topics, Basho’s earlier blog post Understanding Riak’s Configurable Behaviors: Part 1 provides sufficient context. (You can find links in the epilogue to the full series, but part 1 covers the necessary background for this post.)

If instead you decide you’d like to avoid reading about and dealing with such complexities entirely, skip over the Nitty Gritty section to The Land of Milk and Honey.

Nitty Gritty

Vector Clocks

One approach that should generally be employed when writing Riak applications is to supply vector clocks with each update. It’s not clear in this particular scenario that it would have helped, but it certainly can’t hurt. Giving Riak more information to track causal history is never a bad thing.

See our documentation on vector clocks for more information. And although the details are a bit dated, our blog post Why Vector Clocks are Easy makes for a nice overview of the concept.

Forcing the Last Write to Win

A rather non-obvious approach is to take the default last write wins conflict resolution one step further.

As discussed in part 1 of the configurable behaviors blog series, there are two closely-related configuration parameters that determine how Riak approaches conflict resolution: allow_mult and last_write_wins. The former indicates whether Riak should keep all conflicts for the client to resolve; the latter is our concern at the moment.

If allow_mult is set to false, setting last_write_wins to true will instruct Riak to always overwrite existing objects, ignoring the timestamps stored with them.

So, nominally, this achieves what we earlier implied to be impossible: the last write truly does win, regardless of clock consistency.

The problem is that we’ve just punted the problem down the road a bit. Yes, all servers that receive an object will blindly write it, but any servers that don’t receive it due to network partition or server failure will still retain an older value, and depending on clock consistency the older value may still win once the network or server failure is corrected.

Broadly speaking, if you’re going to have data consistency problems, it’s best for that to be obvious and easily detectable during testing stages. This “solution'; would have made the situation much harder to recognize before production.

Stopping Last Write Wins

At least in part to limit the complexity of developing applications, Basho decided to specify Riak’s default configuration as allow_mult=false, which requires the database to resolve conflicting writes internally.

As we’ve seen, Riak isn’t exactly a genius at resolving conflicting writes. Beyond the challenges of clock consistency, Riak treats objects as opaque and has no awareness of business logic.

It’s almost always better to bite the bullet: instruct Riak to retain all conflicting updates as siblings (via allow_mult=true) and write your application to deal with them appropriately.

Note: We are planning to change the default setting for allow_mult to true in Riak 2.0, but please check the documentation and your configuration before assuming either behavior.

The Land of Milk and Honey

Distributed data types

Creating data types that can survive network partitions and self-heal has long been a goal for our engineers. With Riak 1.4, Basho introduced distributed counters; with 2.0, Riak will have a larger suite of distributed data types that can resolve conflicts internally, notably including sets and maps.

Although 2.0 is not yet released, a technical preview is available.

It is also possible to define such Riak Data Types (known formally as CRDTs) at the application layer. See the two-part blog series Index for Fun and for Profit and Indexing the Zombie Apocalypse With Riak for more information.

Strong Consistency

Also with 2.0, Riak will include the option of designating certain data as strongly consistent, meaning that the servers that hold a piece of data will have to agree on any updates to that data.

As appealing as that may sound, it is impossible to guarantee strong consistency without introducing coordination overhead and constraining Riak’s ability to continue to allow for requests when servers or networks have failed.

And aren’t low latency and high availability the reasons you’re using Riak?

The Silver(*) Bullet: Immutability

(* or at least stainless steel)

The rise of “big data” is linked to a resurgence of interest in functional programming, which is particularly well-suited for processing large data sets. (See Dean Wampler’s Lambda Jam talk Copious Data for an interesting exposition of this idea.)

One of the key tenets of functional programming is that data is immutable, meaning that destructive updates are not (typically) allowed.

The relational data model does not offer much (any?) support for immutable data, but it is a powerful concept. At Basho’s inaugural RICON conference Pat Helland gave a talk entitled Immutability Changes Everything which goes into more detail.

While it isn’t necessarily true that immutability solves everything with distributed systems, it’s a great start. Without data updates, there are no conflicts.

See the configurable behaviors epilogue (specifically, the discussion of Datomic) for a discussion of configuration tweaks to Riak to take better advantage of immutable data for low latency.

TL;DR

If your distributed system isn’t explicitly dealing with data conflicts, any correct behavior it exhibits is more a matter of good luck than of good design.

If your distributed database relies on clocks to pick a winner, you’d better have rock-solid time synchronization, and even then, it’s unlikely your business needs are served well by blindly selecting the last write that happens to arrive.

Riak provides powerful tools for helping address the inherent challenges of distributed data, but they have to be used to be useful.

John Daily

A Weekly Hangout With Basho

November 11, 2013

Last Friday, the Basho team held our inaugural Riak Community Hangout.

This 30 minute session is a development focused conversation with topics changing weekly. The Hangout is planned for most Fridays at 11am Pacific/2pm Eastern/7pm GMT, with the URL published shortly before it begins. All Hangouts will be archived and hosted on the Basho Technologies Youtube channel. You should follow @basho for all updates about future Hangouts.

Over the next few weeks, these Hangouts will focus on the new features planned for Riak 2.0.

The first session was hosted by Basho’s Director of Community, Mark Phillips, who discussed Riak Data Types and Riak Search 2.0 with Basho engineers Sean Cribbs, Brett Hazen, and Luke Bakken.

The Hangout began with an overview of Riak Data Types, available with the 2.0 Technical Preview, and examined their implementation, use cases, and implementation considerations. Following this (at 18 minutes, 35 seconds), Brett Hazen provided an overview of Riak Search 2.0 (codenamed Yokozuna) and Luke Bakken queried a portion of the Twitter stream on a cluster running the newest Riak Search 2.0 code.

Upcoming sessions will focus on Riak/Riak CS internals, application building, data modeling, and community requested topics. We are also looking for community members to join in and highlight what you’re building with Riak and Riak CS developers

If you have questions or topics you would like to hear discussed, reach out on the message list, in IRC (#riak on irc.feenode.net), or contact us.

Basho

Basho Announces Technical Preview of Riak 2.0 and Riak Enterprise 2.0

San Francisco, CA – October 29, 2013 – Today at RICON West, Basho, the worldwide leader in distributed systems and cloud storage software, announced that the Technical Preview of Riak 2.0, Basho’s distributed NoSQL database, is now publicly available. This major release introduces new features that improve developer ease-of-use, increase flexibility around consistency, boost search and analytics capabilities, simplify operations at scale, and provide enterprise-class data security.

Riak continues to gain adoption worldwide supporting critical applications that require high-availability, predictable scalability, and performance. Riak’s unique ability to distribute data, both to ensure availability and provide data locality, provides enterprises a proven database technology for powering critical web, mobile and social applications, cloud computing platforms, and to store and serve machine-to-machine and sensor data. Riak is used by thousands of companies, including over 30% of the Fortune 50.

New Features in Riak 2.0

  • Riak Data Types. Riak 2.0 includes a range of flexible, distributed data types, that greatly simplify application development without sacrificing Riak’s availability and partition tolerance characteristics. Available Riak data types include distributed counters, sets, maps, registers, and flags.
  • Strong Consistency. Developers now have the flexibility to choose whether buckets should be eventually consistent (the default Riak configuration today that provides high availability) or strongly consistent, based on data requirements.
  • Full-Text Search Integration with Apache Solr. Riak Search is completely redesigned in Riak 2.0, leveraging the Apache Solr engine. Riak Search in 2.0 fully supports the Solr client query APIs, enabling integration with a wide range of existing software and commercial solutions.
  • Security. Riak 2.0 adds the ability to administer access rights and utilize plug-in authentication models. Authentication and Authorization is provided via client APIs.
  • Simplified Configuration Management. Riak 2.0 continues to improve Riak’s operational simplicity by changing how, and where, configuration information is stored in an easy-to-parse and transparent format.
  • Reduced Replicas for Secondary Sites. Exclusive to Riak Enterprise 2.0, users can now optionally store fewer copies of replicated data across multiple datacenters to better maintain a balance between storage overhead and availability.

Technical Preview Availability

Download the Riak 2.0 Technical Preview here. All code for Riak 2.0 is also available on Github. For more details on the technical preview for Riak, visit our blog.

About RICON West
RICON West 2013 is part of the RICON conference series. RICON is Basho’s distributed systems conference for developers and academics. RICON West will take place in San Francisco, CA on October 29-30. More than 25 speakers will discuss applications, use cases, and the future of distributed systems – including NoSQL solutions and cloud storage. RICON West 2013 speakers include Basho, Google, Microsoft Research, Netflix, salesforce.com, Seagate, The Weather Company, and Twitter. RICON West 2013 is sold-out; however, Basho will offer a live stream.

About Basho
Basho is a distributed systems company dedicated to making software that is highly available, fault-tolerant and easy-to-operate at scale. Basho’s distributed database, Riak, and Basho’s cloud storage software, Riak CS, are used by fast growing Web businesses and by over 25 percent of the Fortune 50 to power their critical Web, mobile and social applications and their public and private cloud platforms.

Riak and Riak CS are available open source. Riak Enterprise and Riak CS Enterprise offer enhanced multi-datacenter replication and 24×7 Basho support. For more information, visit basho.com. Basho is headquartered in Cambridge, Massachusetts and has offices in London, San Francisco, Tokyo and Washington DC.

Introducing Riak 2.0: Data Types, Strong Consistency, Full-Text Search, and Much More

October 29, 2013

Today at RICON West in San Francisco, we announced the Technical Preview of Riak 2.0 is now available. This major release adds a number of new features that many of you have been waiting for.

Throughout RICON West, we will be discussing many of the Riak 2.0 features (both in track sessions or during lightning talks), so keep your eyes on the live stream over the next two days. Videos of all sessions will also be made available after the conference.

Here is a look at some of the major enhancements available in Riak 2.0:

  • Riak Data Types. Building on the eventually consistent counters introduced in Riak 1.4, Riak 2.0 adds sets and maps as new distributed data types. These Riak Data Types simplify application development without sacrificing Riak’s availability and partition tolerance characteristics.
  • Strong Consistency. Developers have the flexibility to choose whether buckets should be eventually consistent (the default Riak configuration today that provides high availability) or strongly consistent, based on data requirements.
  • Full-Text Search Integration with Apache Solr. Riak Search is completely redesigned in Riak 2.0, leveraging the Apache Solr engine. Riak Search in 2.0 supports the Solr client query APIs, enabling integration with a wide range of existing software and commercial solutions.
  • Security. Riak 2.0 adds the ability to administer access rights and utilize plug-in authentication models. Authentication and Authorization is provided via client APIs.
  • Simplified Configuration Management. Riak 2.0 continues to improve Riak’s operational simplicity by changing how, and where, configuration information is stored in an easy-to-parse and transparent format.
  • Reduced Replicas for Multiple Data Centers. Riak Enterprise 2.0 can optionally store fewer copies of replicated data across multiple data centers to better maintain a balance between storage overhead and availability.

Ready to get started? Download the Technical Preview.

Please note that this is only a Technical Preview of Riak 2.0. This means that it has been tested extensively, as we do with all of our release candidates, but there is still work to be completed to ensure it’s production hardened. Between now and the final release, we will be continuing manual and automated testing, creating detailed use cases, gathering performance statistics, and updating the documentation for both usage and deployment.

As we are finalizing Riak 2.0, we welcome your feedback for our Technical Preview. We are always available to discuss via the Riak Users mailing list, IRC (#riak on freenode), or contact us.

Riak 2.0 Technical Preview: Deep Dive

Riak Data Types
In distributed systems, we are forced to trade consistency for availability (see: CAP Theorem) and this can complicate some aspects of application design. In Riak 2.0, we have integrated cutting-edge research on data types known as called CRDTs (Conflict-Free Replicated Data Types) pioneered by INRIA to create Riak Data Types. By adding counters, sets, maps, registers, and flags, these Riak Data Types enable developers to spend less time thinking about the complexities of vector clocks and sibling resolution and, instead, focusing on using familiar, distributed data types to support their applications’ data access patterns.

A more detailed overview of Riak Data Types is available that examines implementation considerations and the basics of usage.

Strong Consistency
In all prior versions, Riak was classified as an eventually consistent system. With the 2.0 release, Riak now lets developers choose when operations should be strongly or eventually consistent. This gives developers a choice between these semantics for different types of data. At the same time, operators can continue to enjoy the operational simplicity of Riak. Consistency preferences are defined on a per bucket type basis, in the same cluster.

A RICON West 2012 talk entitled, Bringing Consistency to Riak, shares much of the initial thinking behind this effort. In addition, the pull request that adds consistency to riak_kv provides detailed information about related repositories and the implementation approach.

Redesigned Full-Text Search
Riak is a key/value store and the values are simply stored on disk as binary. With previous versions of Riak Search, Riak developers have long been able to index the content of these stored values. In Riak 2.0, Riak Search (code-named Yokozuna) has been completely redesigned and now uses the Apache Solr full-text document indexing engine directly. Together, Riak and Solr provide a reliable full-text context indexing solution that is highly available and built for scale. In addition, Riak Search 2.0 also fully supports the Solr client query APIs, which enables integration with existing software solutions (either homegrown or commercial).

The Basho engineers responsible for Yokozuna have created a resources page that includes recorded talks, Solr documentation links, and books on the topic.

Security
Basho designed Riak with critical data in mind. Whether it’s data that affects revenue, user experience, or even a patient’s health (as is the case with the NHS), Riak ensures that this critical data is always available. However, often this critical data is also sensitive data. Riak 2.0 adds security to this data through the ability to administer access rights and plug-in various secure authentication models commonly used today.

The initial RFC that describes the security effort, including related Pull Requests, is available at github.com/basho/riak/issues/355.

Simplified Configuration Management
At Basho, we pride ourselves on providing operationally friendly software that functions smoothly when dealing with the challenges of a distributed system. In the past, configuration of Riak occurred in two files: app.config and vm.args. Riak 2.0 changes how and where configuration information is stored. It no longer uses Erlang-specific syntax but, rather, provides a layout more suited for all operators and automated deployment tools. This layout is easy to parse and transparent for Riak administrators.

More information on the vision and specific implementation considerations are contained in the repository at github.com/basho/cuttlefish.

Bucket Types
In versions of Riak prior to 2.0, keys were made up of two parts: the bucket they belong to and a unique identifier within that bucket. Buckets act as a namespace and allow for similar keys to be grouped. In addition, they provide a means of configuring how previous versions of Riak treated that data.

In Riak 2.0, several new features (security and strong consistency in particular) need to interact with groups of buckets. To this end, Riak 2.0 includes the concept of a Bucket Type. In addition to allowing new features without special prefixes in Bucket names, Riak developers and operators are able to define a group of buckets that share the same properties and only store information about each Bucket Type, rather than individual buckets.

More information about Bucket Types can be found in the Github Issue at github.com/basho/riak/issues/362. This issue describes the planned functionality, discussions about implementation, and includes related pull requests.

Change in Defaults for Sibling Resolution
Riak has always supported both application-side and timestamp and vector clock-based Last Write Wins server-side resolution. Prior to Riak 2.0, vector clock-based Last Write Wins has been the default. Moving forward, new clusters will hand off siblings to applications by default. This is the safest way to work with Riak, but requires developers to be aware of sibling resolution.

In a blog series entitled, Understanding Riak’s Configurable Behaviours, Basho Evangelist John Daily discusses the configuration of Last Write Wins, and many other options, in great detail.

More Efficient Use of Physical Memory
Riak nodes are designed to manage the changing demands of a cluster as it experiences network, hardware, and other failures. To do this, Riak balances each node’s resources accordingly. Riak 2.0 has vastly improved LevelDB’s use of available physical memory (RAM) by allowing local databases to dynamically change their cache sizes as the cluster fluctuates under load.

In the past, it was necessary to specify RAM allocation for different LevelDB caches independently. This is no longer the case. In Riak 2.0, LevelDB databases that manage key/value or active anti-entropy data share a single pool of memory, and administrators are free to allocate as much of the available RAM to LevelDB as they feel is appropriate in their deployment. Detailed implementation documentation can be found in the basho/leveldb wiki.

Riak Ruby Vagrant Project
If you are interested in testing Riak 2.0, in a contained environment with the Riak Ruby Client, Basho engineer Bryce Kerley has put together the Riak-Ruby-Vagrant repository. In addition, this environment can be easily adapted to usage with other clients for testing the new features of Riak 2.0.

Basho