Tag Archives: cluster management

Cluster Management Improvements in Riak 1.2

September 13, 2012

Last month we released Riak 1.2 , with a number of improvements in Riak stats, the protobufs API, LevelDB backend and repair/recovery capabilities. Riak 1.2 also features a new strategy for making cluster changes like adding and removing nodes. With the new approach, Riak allows you to stage changes, view the impact on the cluster, and then commit or abort changes. The increased visibility lets Riak operators make more informed decisions about when and how to scale up, scale down and upgrade or replace nodes. Additionally, you can now make multiple changes, like adding a number of nodes, at the same time – critical for large-scale clusters.

Pre 1.2 Cluster Management

In prior versions of Riak, users made changes to the cluster using commands under the “riak-admin” syntax. To add or remove a node to the cluster, you would simply call “riak-admin join” or “riak-admin leave,” and the Riak cluster would immediately begin to handoff data and ownership as appropriate. While this approach was simple, it did raise two issues we’ve tried to address with the new cluster management capabilities:

  • Coordinating cluster changes: Prior to Riak 1.2, there was no way to group changes together. Changes were entered sequentially, and if there was more than one change (e.g. joining multiple nodes to a cluster), the first change would happen in a single transition and the remaining changes (e.g. the rest of the joins) would occur together in a second transition. In the case of multiple joins, in the first transition, data is transferred from the cluster to the new node. Then, in the second transition, some of the data transferred to the first new node is then transferred to the other new nodes, wasting network bandwidth and disk space. This proved particularly problematic for production deployments in which nodes were frequently added or removed.
  • Planning: The pre-1.2 approach to cluster management didn’t give you visibility into how your changes would affect the cluster before you made them. For instance, the only way to know how many transfers a join would take would be to start the join and then run “riak-admin ring-status”. Likewise, you couldn’t know what ownership would look like until after the join.

Staged Clustering

We addressed both of the above issues with a new approach we’re calling ‘Staged Clustering’.

In Riak 1.2, instead of joins, leaves, etc. taking place immediately, they’re first staged. After staging cluster changes, you can view how the changes will affect the cluster, seeing how ring ownership will change and how many transfers between nodes will need to occur to complete the transition. After looking at the plan, you can then add or remove changes staged to be committed, scrap the plan, or execute it as is.

Staged Clustering Riak

Staged Clustering High Level Process

The ‘Staged Clustering’ interface is implemented in Riak’s command line tool, riak-admin, under the ‘cluster’ command. Underneath the ‘cluster’ command are subcommands used to stage, view, and commit cluster changes (e.g. to join the current cluster to node dev1, you’d use: ‘riak-admin cluster join dev1’ ). You can read more about the new syntax in the Riak Wiki. Currently, the new approach to cluster management is not implemented in Riak Control, our open-source management and monitoring GUI, but is planned for a later release.

Example

Let’s take a look at how the new cluster management strategy would work in a scenario where we wanted to add three nodes to an existing node (dev1) to form a four-node cluster.

1. View the Current Member Status

First, we call ‘riak-admin member_status’ to get a view of the current cluster, the nodes in it and their current ring ownership:

Member Status Riak

2. Stage Joining New Nodes

Next, we’ll join three nodes (dev2, dev3, dev4) to the cluster using the cluster command.

bash
dev2: riak-admin cluster join dev1
dev3: riak-admin cluster join dev1
dev4: riak-admin cluster join dev1

The joins are now staged for commit.

3. View How Staged Changes Will Affect the Cluster

Now we can use the new ‘riak-admin cluster plan’ command to see the impact of the joins on the cluster, viewing changes to ring ownership and transfers that need to occur.

bash
riak-admin cluster plan

Cluster Management Plan Riak

In the output, we see: the changes staged for commit, the number of resulting cluster transitions (1), how the data will be distributed around the ring after transition (25% on each node), and the number of transfers the transition will take (48 total).

4. Commit Changes

If we want to commit these changes, we use the commit command:

bash
riak-admin cluster commit

These changes start taking place immediately. If we run ‘riak-admin member_status’, we can see the status of the transition. Additionally, we’ve fleshed out the ‘riak-admin transfers’ command to give you much more visibility into active transfers in Riak 1.2.

Other Resources

For more in-depth information on the new cluster management stuff in Riak 1.2, check out this recorded webinar with Basho engineer Joseph Blomstedt and the updated docs.

Riak Control

February 22, 2012

Riak Control is Basho’s new OSS, REST-driven, user-interface for Riak. The code has been available for a few months now, but it’s officially supported in Riak 1.1, so we wanted share some details on what it’s about and why you should be excited about it.

Lowering the Barrier for Entry

Once a Riak cluster is up and running we want it to be as hands-free to administer as possible. Things should “just work,” like plumbing. But we’ll be the first to admit that a new user’s initial welcoming with Riak isn’t always as pleasant as it should be.

Some steps are unavoidable: downloading, installing, and/or building from source, etc. But once the initial work is done, the experience should be as inviting as possible. Riak is a very powerful database with numerous options and commands. Riak Control allows you to easily manage/inspect your cluster while ignoring many of these until needed.

Empowering Riak Administrators

When we first sat down to decide what the more important features for any Riak interface should be, one theme stood out above all the others: cluster management. We wanted to give developers and administrators the ability to quickly build a cluster, inspect nodes, and diagnose the health of their cluster. And we wanted it to happen fast.

Riak is about large datasets and clusters replicating that dataset for maximum availability and persistence. We’re working hard to help companies that write many, many GBs per day to clusters containing 50+ nodes. Riak Control is a tool that brings issues and risks front-and-center. And it gives customers the ability to take action in real-time.

The Two-Minute Tour

Riak Control is currently broken up into nested levels of detail. Each page in Riak Control is designed to give you just as much information as you need, nothing more. As you navigate the UI, you’ll gradually be taken deeper into the rabbit hole.

The Overview/Snapshot

Snapshot

The Snapshot is what you’ll see when you first fire up Riak Control. It should give you a warm-fuzzy feeling when everything is A-okay: an unmistakable, beautiful green check mark.

For times when things aren’t perfect, you will be presented with a list of concern areas. Each will have links to other pages of Riak Control where you can take a closer look at the problem.

The Cluster

Custer

The cluster page is where you can get a quick look at all of the nodes in your cluster and manage membership.

With a glance you can see which nodes are partitioned from the rest of the cluster or offline, which are leaving or joining the cluster, view partition ownership, monitor memory, and more. And with a click you can add nodes to the cluster, take nodes offline for maintenance, and leave the cluster.

The Ring

Ring

One level deeper than the cluster view is the ring page. This is where you can see the health of each partition. Most of the time, your ring will be too large to really manage from the ring view. But with the filters you can immediately find which partitions are owned by which nodes, partitions whose primary nodes are unreachable, current handoffs, and more.

What’s Next?

Riak Control is not standing still. Riak 1.1 includes Riak Control in its early stages so we can begin to gather feedback. We want to know what it does right and what it does wrong. Your feedback and ideas are encouraged. Additionally, we have a list of features and functionality slated for future releases. None of these are set in stone, but here is a list of what we have planned…

Pluggable

While Riak Control is – at its heart – a simple REST API, we’re working to modularize it in a way that allows you to write your own modules/plugins. We want to see Riak Control become a collection of pieces that all snap and work together, empowering you to manage your cluster in the way that best fits your needs.

Event streaming

Currently Riak Control uses a pull model to gather information about the cluster. While this isn’t a performance issue, we very much want to make it a push-system. As things happen to the cluster, the cluster should notify Riak Control of the changes, which in-turn will notify the user.

Node Statistics

Clicking on a node name from anywhere should take you to a page giving details specifically about that node, similar to the data you would get from a riak-admin status command.

Bucket & Object Inspection

While low-level object manipulation isn’t designed to be a primary feature of Riak Control, it is a very handy tool to have, and extremely valuable when initially setting up Riak for the first time. More importantly, Riak buckets will be available to create and inspect.

MapReduce Queries

Riak Control will feature a powerful interface for creating MapReduce queries. You will be able to debug, save, load, and execute previously saved queries with ease.

Customer Support Tools

In addition to the general tools provided for manipulation of the cluster and data, we also are planning for improve monitoring tools.

  1. View the log files of individual nodes
  2. See graphs of load, memory latency, disk usage, etc.
  3. Coalesce and bundle data for support tickets
  4. File support tickets

Any Comments, Questions, or Feature Requests?

Anything you’d like to share or ask? Join the Riak-Users Mailing List and tell us what you think. The other option is to fork the code and make your opinions known with a pull request or by filing an issue. You can also find some formal documentation on the Riak Wiki.

Thanks for being a part of Riak.

Jeff Massung