February 12, 2014
Datomic is a distributed database system that supports queries, joins, and ACID transactions. Through its pluggable persistence layer, you can wire Datomic up to a horizontally scalable key/value store that strives for operational simplicity, like Riak.
Below, we’ll explore the specifics around getting Riak enabled as a storage service for Datomic. We will also provide you with a Vagrant project that automates many of these steps, so you can have a local development environment with a Riak-backed Datomic running within minutes.
Datomic stores indexes and a log of known transactions in its storage backend. You can think of the indexes as sorted sets of datoms, and the data log as a recording of all transaction data in historic order.
Both of these pieces of data are stored as trees with blocks that are roughly
64K in size. The blocks themselves are immutable and cater very well to the strengths of eventual consistency. Other bits of data, like the root pointers (for the trees) for indexes and the data log, require the ability to compare-and-swap (CAS). They need to be stored in a strongly consistent backend.
We won’t go through the details of standing up a ZooKeeper ensemble here, but once you have, make sure you have a list of
IP:PORT pairs for each instance (at least three recommended for production usage).
Note: Strong consistency is coming in Riak 2.0 and will make ZooKeeper unnecessary for this use case.
Riak is a distributed key/value store with an emphasis on high availability. To learn more, download the free eBook, A Little Riak Book.
To get started with Riak, head over to the Quick Start Guide and walk through the setup of a five-node cluster.
In Datomic, the Transactor component is responsible for coordinating write requests and is a critical single point of failure. Think of the Transactor the same way you think about a relational database. You need one, but you may also want another ready to go if the primary fails.
The Transactor needs to know a few things about Riak:
riak-interface(valid options are
riak-bucket(can just set this to
Note: The Transactor passes the Riak host and port to the riak-java-client. You’ll want to round-robin requests against all of the nodes in your cluster evenly (usually accomplished with a load balancer). If you setup a load balancer to front your Riak cluster, provide its host and port to the Transactor via
The Forbidden Dance
At this point it’s assumed that you have a ZooKeeper ensemble, Transactor instance, and Riak cluster ready to go. Now, fetch your list of ZooKeeper nodes and supply it (comma delimited) as the payload of an HTTP
PUT request to Riak like so:
Now all of the components can talk to each other!
For those who aren’t familiar, Vagrant simplifies the process of creating and configuring virtual development environments. By combining it with a few Chef cookbooks for Datomic, ZooKeeper, and Riak, we can automate all of the steps described above (for a local development environment).
Simply clone the vagrant-datomic-riak repository and execute the following:
February 3, 2014
2014 is an exciting year for Basho and, as usual, we are traveling the world to let you know what we’re up to. Here’s a look at where we’ll be this February.
LA Ruby Conf 2014: Basho is a proud sponsor and we will be in LA to chat Riak and answer any questions you may have. LA Ruby Conf takes place February 6-8.
New York Meetup: On February 10th, Basho Technical Evangelist, Hector Castro, will present on “Supporting Riak and Riak CS Deployments with Chef” at DigitalOcean. He will explore the history of maintaining the Riak/Riak CS cookbooks and discuss how people are using them. He will also discuss how Basho is planning to revamp the cookbooks to take advantage of features coming in Riak 2.0.
O’Reilly Strata: Basho is a proud sponsor of O’Reilly Strata in Santa Clara (February 11-13). Be sure to stop by our booth to learn more about Riak and grab some swag.
Big Ruby Conf: Basho Technical Evangelist, Hector Castro, will be presenting “Throw Some Keys On It: Data Modeling for Key/Value Data Stores by Example” on February 21st. Big Ruby Conf takes place in Dallas, TX from February 20-21.
Code PaLOUsa: Code PaLOUsa takes place in Louisville, KY from February 24-26. On February 24th at 9am, we will be hosting a Riak workshop. In addition, Alex Moore (Basho Client Services Engineer) will speak on “Scaling Your Data Safely for Fun and Profit with Riak,” Sean Cribbs (Basho Software Engineer) will speak on “In Search of the Software Ursatz,” and John Daily (Basho Technical Evangelist) will speak on “Erlang, or How I Learned to Stop Worrying and Let Things Fail.”
Open Source Conference: Open Source Conference takes place from February 28-March 1 in Tokyo, Japan. Kaz Suzuki from Basho will be presenting an introduction to Riak and Riak CS. He will also be demoing Riak CS at the Basho booth.
For a full list of where we’ll be, check out the Events Page.
December 4, 2013
Way back on March 19th, 2013, we published our first Riak and Riak CS quarterly community survey. Our last survey wrapped up on October 25th, 2013, so here are some anonymized results we wanted to share with you.
Google Chrome has a great reputation for keeping user installations current. For Riak, it’s great to see that a large number of our community users are running Riak 1.4 and above.
My operating system is better than yours.
LevelDB usage continues to grow. For those of you who are running it in production, do yourself a favor and watch Matthew Von-Maszewski’s talk on optimizing LevelDB from RICON East 2013.
Riak’s default configuration ships with
64. The highest our supported customers use is
1024. Here are ring sizes running in the wild.
Average Object Size
Large objects (upwards of
4MB) can make Riak clusters unhappy. Luckily, all users who completed the Q3 survey are storing objects that average under
(If you’re looking for something to store objects that are hundreds of megabytes to terabytes in size, have a look at Riak CS.)
Configuration management tools make life simpler for our users, enabling automated creation and management of Riak clusters.
Note: Results are based on the Q3 community survey results.
The goal of the community survey is to help us better understand how people are using Riak and Riak CS. Results from past surveys have been distributed internally, and in several cases have guided our decision-making process around Riak.
Our next community survey is coming up soon, so be on the lookout!
September 9, 2013
Chef is a configuration management system that is widely deployed by Operations teams around the world. Tools like Chef can bring sanity and uniformity when deploying a massive Riak cluster; however, as with any tool, it needs to be reliably tested, as any misconfiguration could bring down systems. Here is the story of Chef and Riak.
The first public Chef Cookbook was pushed to Github on July 18, 2011, back when Riak 0.14.2 was the latest and greatest. We started by making the basic updates for releases but, as both the Riak and Chef user base grew, so did the number of issues and pull requests. Even with some automation, testing was still time consuming, error prone, and problematic. Too much time was being spent catching bugs manually, a familiar story for anyone who has had to test any software.
Our initial reaction was to only keep what we knew users (primarily customers) were using. As testing the build from source was so time consuming, it was removed until we could later ensure that it be properly tested. We knew that we had to start automating this testing pipeline to not only maintain quality, but sanity. Fortunately, Chef’s testing frameworks were beginning to come into their own and a free continuous integration service for GitHub repositories called TravisCI was starting to take off. However, before talking about the testing frameworks, we need to cover two tools that help make this robust testing possible.
Vagrant is a tool that leverages virtualization and cloud providers, so users don’t have to maintain custom static virtual machines. Vagrant was on the rise when we started the cookbooks and was indispensable for early testing. While it didn’t offer us a completely automated solution, it was far ahead of anything else at the time and serves as a great building block for our testing today.
There are also a variety of useful plugins that we use in conjunction with it, including vagrant-berkshelf and vagrant-omnibus. The former integrates Vagrant and Berkshelf so each Vagrant “box” has its own self-contained set of cookbooks that it uses and the latter allows for easy testing of any version of Chef.
Berkshelf manages dependencies for Chef cookbooks – like a Bundler Gemfile for cookbooks. It allows users to identify and pin a known good version, bypassing the potential headaches of trying to keep multiple cookbooks in sync.
Now, back to testing frameworks.
Foodcritic is a Ruby gem used to lint cookbooks. This not only checks for basic syntax errors that would prevent a recipe from converging, but also for style inconsistencies and best practices. Foodcritic has a set of rules that it checks cookbooks against. While most of them are highly recommended, there may be a few that don’t apply to all cookbooks and can be ignored on execution. Combine this with TravisCI, and each commit or pull request to GitHub is automatically tested.
While this is helpful, it still didn’t actually help us test that the cookbooks worked. Luckily, we weren’t the only ones with this issue, which is why Fletcher Nichol wrote test-kitchen.
Test-kitchen is another Ruby gem that helps to integrate test cookbooks using a variety of drivers (we use the Vagrant driver). For products like Riak and Riak CS, there are a number of supported platforms that we need to run the cookbook against, and that’s exactly what this tool accomplishes.
In the configuration file for test-kitchen, we define the permutation of Vagrant box, run list, and attributes for testing as many cases for the cookbook as needed. With this, we are able to execute simple Minitests against multiple platforms and we can also test both our enterprise and open source builds at any version by configuring attributes appropriately.
Granted, if you need to spin up a ton of virtual machines in parallel, you’ll want a beefy machine, but the upside is that you’ll have a nice status report to know which permutation of platform/build/version failed.
Why is This Important?
With these tools, we are able to make sure our builds pass all tests across platforms. Since we have many customers deploying the latest Riak and Riak CS version with Chef, we need to ensure that everything works as expected. These tools allowed us to move from testing every cookbook change manually to automatically testing the permutations of operating system, Chef, and Riak versions.
Now everyone gets a higher quality cookbook and there are fewer surprises for those maintaining it. Testing has shifted from a chore to a breeze. This benefits not only our users, but ourselves included as these cookbooks are used to maintain our Riak CS demo service.
Check out our Docs Site for more information about installing Riak with Chef.
Special thanks to Joshua Timberman (advice), Fletcher Nichol (test-kitchen), Hector Castro (reviews and PRs), Mitchell Hashimoto (Vagrant), and Jamie Winsor (Berkshelf).
February 4, 2011
The “Riak Fast Track” has been around for at least nine months now, and lots of developers have gotten to know Riak that way, building their own local clusters from the Riak source. But there’s always been something that has bothered me about that process, namely, that the developer has to build Riak herself. Basho provides pre-built packages on downloads.basho.com for several Linux distributions, Solaris, and Mac OS/X, but these have the limitation of only letting you run one node on a machine.
I’ve been a long-time fan of Chef the systems and configuration management tool by Opscode, especially for the wealth of community recipes and vibrant participation. It’s also incredibly easy to get started with small Chef deployments with Opscode’s Platform, which is free for up to 5 managed machines.
Anyway, as part of updating Riak’s Chef recipe last month to work with the 0.14.0 release, I discovered the easiest way to test the recipe — without incurring the costs of Amazon EC2 — was to deploy local virtual machines with Vagrant. So this blog post will be a tutorial on how to create your own local 3-node Riak cluster with Chef and Vagrant, suitable for doing the rest of the Fast Track.
Step 1: Install VirtualBox
Under the covers, Vagrant uses VirtualBox, which is a free virtualization product, originally created at Sun. Go ahead and download and install the version appropriate for your platform:
Step 2: Install Vagrant and Chef
Now that we have VirtualBox installed, let’s get Vagrant and Chef. You’ll need Ruby and Rubygems installed for this. Mac OS/X comes with these pre-installed, but they’re easy to get on most platforms.
Now that you’ve got them both installed, you need to get a virtual machine image to run Riak from. Luckily, Opscode “has provided some images for us that have the 0.9.12 Chef gems preinstalled. Download the Ubuntu 10.04 image and add it to your local collection:
Step 3: Configure Local Chef
Head on over to Opscode and sign up for a free Platform account if you haven’t already. This gives you access to the cookbooks site as well as the Chef admin UI. Make sure to collect your “knife config” and “validation key” from the “Organizations” page of the admin UI, and your personal “private key” from your profile page. These help you connect your local working space to the server.
Now let’s get our Chef workspace set up. You need a directory that has specific files and subdirectories in it, also known as a “Chef repository”. Again Opscode has made this easy on us, we can just clone their skeleton repository:
Now let’s put the canonical Opscode cookbooks (including the Riak one) in our repository:
Finally, put the Platform credentials we downloaded above inside the repository (the .pem files will be named differently for you):
Step 4: Configure Chef Server
Now we’re going to prep the Chef Server (provided by Opscode Platform) to serve out the recipes needed by our local cluster nodes. The first step is to upload the two cookbooks we need using the *knife* command-line tool, shown in the snippet below the next paragraph. I’ve left out the output since it can get long.
Then we’ll create a “role” — essentially a collection of recipes and attributes — that will represent our local cluster nodes, and call it “riak-vagrant”. Using knife role create will open your configured EDITOR (mine happens to be emacs) with the JSON representation of the role. The role will be posted to the Chef server when you save and close your editor.
The key things to note about what we’re editing in the role below are the “run list” and the “override attributes” sections. The “run list” tells what recipes to execute on a machine that receives the role. We configure iptables to run with Riak, and of course the relevant Riak recipes. The “override attributes” change default settings that come with the cookbooks. I’ve put explanations inline, but to summarize, we want to bind Riak to all network interfaces, and put it in a cluster named “vagrant” which will be used by the “riak::autoconf” recipe to automatically join our nodes together.
Step 5: Setup Vagrant VM
Now that we’re ready on the Chef side of things, let’s get Vagrant going. Make three directories inside your Chef repository called dev1, dev2, and dev3, just like from the Fast Track. Change directory inside dev and run vagrant init. This will create a Vagrantfile which you should edit to look like this one (explanations inline again):
Remember: change any place where it says ORGNAME to match your Opscode Platform organization.
Step 6: Start up dev1 Now we’re ready to see if all our preparation has paid off:
If you see lines at the end of the output like the ones above, it worked! If it doesn’t work the first time, try running vagrant provision from the command line to invoke Chef again. Let’s see if our Riak node is functional:
Step 7: Repeat with dev2, dev3
Now let’s get the other nodes set up. Since we’ve done the hard parts already, we just need to copy the Vagrantfile from dev1/ into the other two directories and modify them slightly.
The easiest way to describe the modifications is in a table:
| Line | dev2 | dev3 | Explanation |
| 7 | “18.104.22.168” | “22.214.171.124” | Unique IP addresses |
| 11 (last number) | 8092 | 8093 | HTTP port forwarding |
| 12 (last number) | 8082 | 8083 | PBC port forwarding |
| 40 | “riak-fast-track-2″ | “riak-fast-track-3″ | Unique chef node name |
| 48 | “email@example.com″ | “firstname.lastname@example.org″ | Unique Riak node name |
With those modified, start up dev2 (run vagrant up inside dev2/) and watch it connect to the cluster automatically. Then repeat with dev3 and enjoy your local Riak cluster!
Beyond just being a demonstration of cool technology like Chef and Vagrant, you’ve now got a developer setup that is isolated and reproducible. If one of the VMs gets too messed up, you can easily recreate the whole cluster. It’s also easy to get new developers in your organization started using Riak since all they have to do is boot up some virtual machines that automatically configure themselves. This Chef configuration, slightly modified, could later be used to launch staging and production clusters on other hardware (including cloud providers). All in all, it’s a great tool to have in your toolbelt.