Tag Archives: Riak

Where To Start With Riak Core

April 12, 2011

There has been a lot of buzz as of late around “riak_core” in various venues, so much so that we are having trouble producing enough resources and content to keep the community at bay (though we most-certainly have plans to). While we hustle to catch up, here is the rundown on what is currently available for those of you who want to learn about, look at, and play with riak_core.

(TL;DR – riak_core is the distributed systems framework that underpins Riak and is, in our opinion, what makes Riak the best and most-robust distributed datastore available today. If you want so see it in action, go download Riak and put it through its paces.)

Blogs

If you know nothing about riak_core (or are in the mood for a refresher), start with the Introducing Riak Core blog post that appeared on the Basho Blog a while back. This will introduce you, at a very high-level, to what riak_core is and how it works.

Slides and Videos

There are varying degrees of overlap in each of these slides and videos, but they all address riak_core primarily.

Code

  • riak_core repo on GitHub
  • Basho Banjo – Sample application that uses Riak Core to play distributed music
  • Try Try Try – Ryan Zezeski’s working blog that is taking an in depth look at various aspects of riak_core
  • rebar_riak_core – Rebar templates for riak_core apps from the awesome team at Webster/Clay

Getting Involved With Riak and Riak Core

We are very much as the beginning of what Riak Core can be as a stand alone platform for distributed applications, so if you want to get in at the ground floor of something that we feel is truly innovative and unparalleled, now is the time. The best way to join the conversation and to help with the development of Riak Core is to join the Riak Mailing list where you can start asking questions and sharing code.

If you want to see riak_core in action, look no further than Riak, Riak Search, and Luwak. The distribution and scaling components for all of these projects if handled by riak_core.

Also, make sure to follow the Basho Team on Twitter as we spend way too much time talking about this stuff.

Mark

Welcome, New Ripple Committers

April 12, 2011

Two of the best things about working with open-source software are being part of a community and collaborating with other skilled people. When your project becomes more than just a toy, you need others to help keep it going and to infuse fresh ideas into it. You need additional maintainers. Ripple (the Ruby client driver and tools) has been no exception to this rule, so this past weekend I was thrilled to welcome two new committers to the project.

Myron Marston

Myron Marston (@myronmarston, github/myronmarston) is porting a large MySQL-backed application over to Riak at SEOmoz. Myron’s project uses the Ripple ORM fairly heavily, so he’s already fixed a bunch of things related to associations and validations.

Kyle Kingsbury

Kyle Kingsbury (@aphyrgithub/aphyr) has been working on some awesome Riak infrastructure at his day job at Vodpod. In the course of his work, he removed the ActiveSupport dependency from riak-client and created another Riak ORM called Risky.

Thank you Myron and Kyle for your continued contributions!

Sean

Free Webinar – Riak Operations – April 14 @ 2PM Eastern

April 7, 2011

Riak, like any other database, is not a piece of your infrastructure to be taken lightly. There are a variety of operational concerns that must be taken into account to ensure both reliability and performance of your Riak cluster. Deployments to different types of environments means knowing the nuances of each environment and being able to adapt to its performance characteristics. This webinar will cover many of the best practices in maintaining your Riak cluster in a variety of environments, as well as monitoring your cluster and understanding how to diagnose performance issues.

We invite you to join us for a free webinar on Thursday, April 14 at 2:00PM Eastern Time (UTC-4) to talk about Riak Operations. In this webinar, we’ll discuss:

  • Basic Riak Operations (config files, logs, etc)
  • Monitoring Riak
  • Backups
  • Understanding Riak Performance
  • Riak in Virtualized Environments

We’ll address the above topics as well as many others. The presentation will last 30 to 45 minutes, with time for questions at the end.

Registration is now closed. 

Ripple 0.9 Release

April 4, 2011

I’m proud to announce the Ripple family of gems (riak-client, ripple, riak-sessions) version 0.9.0 were released yesterday. This is a huge leap forward from the 0.8 series, last release of which was in December. I’m going to highlight some of the best new features in this blog post.

HTTP+SSL Support

Adam Hunter did an awesome job implementing support for HTTPS, including navigating the various idiosyncracies of HTTP client libraries. If you’ve got HTTPS turned on in Riak, or a reverse-proxy in front that provides SSL, it’s easy to set up.

“`ruby
# Turn on SSL
client = Riak::Client.new(:ssl => true)

# Alternatively, provide the protocol client = Riak::Client.new(:protocol => “https”)

# Want to be a good SSL citizen? Use a client certificate.
# This can be used to authenticate clients automatically on the server-side.
client.ssl = {:pem_file => “/path/to/pemfile”}

# Use the CA chain for server verification
client.ssl = { :ca_path => “/path/to/ca_cert/dir” }

# All three of the above options will invoke “peer” verification.
# Use “none” verification only if you’re lazy. This is the default
# if you don’t specify a client certificate or CA.
client.ssl = { :verify_mode => “none” }
“`

Adam also added HTTP Basic authentication for those who use it on their reverse-proxy servers. It can be set with the :basic_auth option/accessor as a string of “user:password”.

Protocol Buffers

Riak has had a Protocol Buffers-based client API for a long time, but the state of Protocol Buffers support in Ruby has been very bad until recently. Thanks to Blake Mizerany’s “Beefcake” library, it was really simple to add support in a cross-platform way. While not insanely faster, the decreased overhead for many operations can make a big difference in the long run. Check out these benchmarks (run on MRI 1.9.2, comparing against the Excon backend):

                               user     system      total        real
http  ping                 0.020000   0.010000   0.030000 (  0.084994)
pbc   ping                 0.000000   0.000000   0.000000 (  0.007313)
http  buckets              0.010000   0.000000   0.010000 (  0.894827)
pbc   buckets              0.000000   0.000000   0.000000 (  0.864926)
http  get_bucket           0.480000   0.020000   0.500000 (  1.075365)
pbc   get_bucket           0.170000   0.030000   0.200000 (  0.271493)
http  set_bucket           0.060000   0.000000   0.060000 (  0.660926)
pbc   set_bucket           0.030000   0.000000   0.030000 (  0.579500)
http  store_new            0.710000   0.040000   0.750000 (  2.443635)
pbc   store_new            0.630000   0.030000   0.660000 (  1.382278)
http  store_key            0.730000   0.040000   0.770000 (  2.779741)
pbc   store_key            0.580000   0.020000   0.600000 (  1.539332)
http  fetch_key            0.690000   0.030000   0.720000 (  2.014679)
pbc   fetch_key            0.410000   0.030000   0.440000 (  0.948865)
http  keys                 0.300000   0.090000   0.390000 ( 78.455719)
pbc   keys                 0.530000   0.020000   0.550000 (  0.828484)
http  key_stream           0.200000   0.010000   0.210000 (  0.689116)
pbc   key_stream           0.530000   0.010000   0.540000 (  0.833347)

Adding Protocol Buffers required a breaking change in the Riak::Client class, namely that the port setting/accessor was split into http_port and pb_port. If you still have this setting in a configuration file or your code, you will receive a deprecation warning.

“`ruby
# Use Protocol Buffers! (default port is 8087)
client = Riak::Client.new(:protocol => “pbc”, :pb_port => 8087)

# Use HTTP and Protocol Buffers in parallel, too! (Luwak and Search require HTTP)
client.store_file(“bigpic.jpg”, “image/jpeg”, File.open(“images/bigpic.jpg”, ‘rb’))
“`

**Warning**: Because some operations (namely get_bucket_props and set_bucket_props) are not semantically equivalent on both interfaces, you might run into some unexpected problems. I have been assured that these differences will be fixed soon.

MapReduce Improvements

Streaming MapReduce is now supported on both protocols. This lets you handle results as they are produced by MapReduce rather than waiting for all the results to be accumulated. Unlike the traditional mode, you will be passed a phase number in addition to the data from the phase. Like key-streaming, just give a block to the run method.

“`ruby
# Make a MapReduce job like usual.
Riak::MapReduce.new(client).
add(“people”,”sean”).
link(:tag => “friend”).
map(“Riak.mapValuesJson”, :keep => true).
run do |phase, data| # Streaming!
puts data.inspect
end
“`

MapReduce key-filters were available in beta releases of 0.9. This is a new feature in Riak 0.14 that lets you reduce the number of keys fed to your MapReduce query by some criteria of the key names. Here’s an example:

“`ruby
# Blockish builder syntax for key-filters
Riak::MapReduce.new(client).
filter(“posts”) do
tokenize “-“, 1 # Split the key on dashes, take the first token
string_to_int # Convert the token to an integer
eq 2011 # Only pass the ones from 2011
end

# Same as above, without builder syntax
Riak::MapReduce.new(client).
add(“posts”, [["tokenize", "-", 1],
["string_to_int"],
["eq", 2011]])
“`

Ripple::Document Improvements

Ripple::Document models got a lot of small improvements, including:

  • Callback ordering was fixed.
  • Documents can be serialized to JSON e.g. for API responses.
  • Client errors bubble up when saving a Document.
  • Several association proxy bugs were fixed.
  • The datetime serialization format defaults to ISO8601 but is also configurable.
  • Mass-attribute-assignment protection was added, including protecting :key by default.
  • Embedded documents can be compared for equality, which amounts to attribute equality when under the same parent document.
  • Documents can now have observer classes which can also be generated by the ripple:observer generator.

Testing Improvements

In order to make sure that the client layer is sufficiently independent of transport semantics and that the lower layers comply with the “unified” backend API, there is a new suite of integration tests for riak-client that covers operations that are supported by both transport mechanisms. This should make it much easier to implement new client backends in the future.

The Riak::TestServer was made faster and more reliable by a few changes to the Erlang bits that power it.

Onward to 1.0

Recently the committers and some members of the community joined me in discussing some key features that need to be in Ripple before it reaches “1.0” status. Some of them will be really incredible, and I’m anxious to get started on them:

  • Enhanced Document querying (scopes, indexing, lazy loading, etc)
  • User-defined sibling-resolution policies, with automatic retries
  • Enhanced Riak Search features
  • Platform-specific Protocol Buffers drivers (MRI C, JRuby, Rubinius C++)
  • Server templates for creating development clusters (extracted from Riak::TestServer)

To that end, I’ve created a 0.9-stable branch which will only receive bugfixes going forward. All new development for 1.0 will be done on master. We’re likely to break some legacy APIs, so we will try to add deprecation notices to the 0.9 series where possible.

Enjoy this latest release of Ripple!

Sean and contributors

Why MapReduce is Easy

March 30, 2011

There’s something about MapReduce that makes it seem rather scary. It almost has this Big Data aura surrounding it, making it seem like it should only be used to analyze a large amount of data in a distributed fashion. It’s one of the pieces that makes Riak a pretty versatile key-value store. Feed a bunch of keys into it, and do some analytics on the objects, quite handy.

But when you narrow it down to just the basics, MapReduce is pretty simple. I’m almost 100% certain even that you’ve used it in one way or another in an application you’ve written. So before we go all distributed, let’s break MapReduce down into something small that you can use every day. That certainly has helped me understand it much better.

For our webinar on Riak and Node.js we built a little application with Node.js and Riak Search to store and search syslog messages. It’s called Riaktant and handily converts and stores syslog messages in a way that’s friendlier for both Riak Search and MapReduce. We’ll base this on examples we used in building the application.

MapReduce is easy because it works on simple data

MapReduce loves simple data structures. Why? Because when there are no deep, nested relationships between say, objects, distributing data for parallel processing is a breeze. But I’m getting a little ahead of myself.

Let’s take the data Riaktant stores in Riak and see how easy it is to sift through it without even having to go distributed. It uses a JavaScript library called glossy to parse a syslog message and turn it into this nice JSON data structure.

javascript
message = {
"originalMessage": "<35>1 2011-02-14T11:10:25.137+01:00 lb1.basho.com ftpd 7003 - Client disconnected",
"time": "2011-02-14T10:10:25.137Z",
"severityID": 3,
"facility": "auth",
"version": 1,
"prival": 35,
"host": "lb1.basho.com",
"facilityID": 4,
"message": "7003 - Client disconnected",
"severity": "err"
}

MapReduce is easy because you use it every day

I’m almost 100% certain you use MapReduce every day. If not daily, then at least once a week. Whenever you have a list of items that you loop or iterate over and transform into something else one by one, if only to extract a single attribute, there’s your map function.

Keeping with JavaScript, here’s how you’d extract the host from the above JSON, for a whole list:

“`javascript
messages = [message];

messages.map(function(message) {
return message.host
}))
“`

Or, if you insist, here’s the Ruby equivalent:

ruby
messages.map do |message|
message[:host]
end

If you must ask, here’s Python, using a list comprehension, for added functional programming sugar:

python
[message['hello'] for message in messages]

There, so simple, right? Halfway there to some full-fledged MapReduce action.

MapReduce is easy because it’s just code

Before we continue, let’s add another syslog message.

javascript
message2 = {
"originalMessage": "<35>1 2011-02-14T11:10:25.137+01:00 web2.basho.com ftpd 7003 - Client disconnected",
"time": "2011-02-14T10:12:37.137Z",
"severityID": 3,
"facility": "http",
"version": 1,
"prival": 35,
"host": "web2.basho.com",
"facilityID": 4,
"message": "7003 - Client disconnected",
"severity": "warn"
}
messages.push(message2)

We can take the above example even further (still using JavaScript), and perform some additional operations like result sorting, for example.

javascript
messages.map(function(message) {
return message.host
}).sort()

This gives us a nice sorted list of hosts. Coincidentally, sorting happens to be the second step in traditional MapReduce. Isn’t it nice how easily this is coming together?

The third and last step involves, you guessed it, more code. I don’t know about you, but I love things that involve code. Let’s reduce the list of hosts and count the occurrences of each host, (and if this reminds you of an SQL query that involves GROUP BY, you’re right on track).

“`
var reduce = function(total, host) {
if (host in total) {
total[host] += 1
} else {
total[host] = 1
}
return total
}

messages.map(function(message) {
return message.host
}).sort().reduce(reduce, {})
“`

There’s one tiny bit missing for this to be as close to MapReduce as we can get without going distributed. We need to slice up the list before we hand it to the map function. As JavaScript doesn’t have a built-in function to partition a list we’ll whip up our own real quick. After all, we’ve come this far.

function chunk(list, chunkSize) {
for(var position, i = 0, chunk = -1, chunks = []; i < list.length; i++) {
if (position = i % chunkSize) {
chunks[chunk][position] = list[i]
} else {
chunk++;
chunks[chunk] = [list[i]]
}
}
return chunks;
}

It loops through the list, splitting it up into equally sized chunks, returning them neatly wrapped in a list.

Now we can chunk the initial list of messages, and boom, we have our own little MapReduce going, without magic, just code. Let’s put the new chunk function to good use.

javascript
var mapResults = [];
chunk(messages, 2).forEach(function(chunk) {
var messages = chunk.map(function(message) {
return message.host
})
mapResults = mapResults.concat(messages)
})
mapResults.sort().reduce(reduce, {})

We split up the messages into two chunks, run the map function for each chunk, collecting the results as we go. Then we sort the results and feed them into the reduce function. That’s MapReduce in eight lines of JavaScript code. Easy, right?

That’s all there’s to MapReduce. You use it every day, whether you’re aware of it or not. It works nicely with simple data structures, and it’s just code.

Unfortunately, things get complicated as soon as you go distributed, for example in a Riak cluster. But we’ll save that for the next post, where we’ll examine why MapReduce is hard.

Mathias

Riak and Scala at Yammer

March 28, 2011

What’s the best way to start off the week? With an awesome presentation from some very talented engineers about building a Riak-backed service.

This video, which runs about 40 minutes, was recorded last week at the San Francisco Riak Meetup and is worth every minute of your time. Coda Hale and Ryan Kennedy of Yammer give an excellent and in depth look into how they built “Streamie”, why Riak was the right choice, and the lessons learned in the process.

Also:

* The PDF of the slide presentation can be viewed here
* Around the five minute mark Coda references a paper called “The Declarative Imperative: Experiences and Conjectures in Distributed Logic.”
* If you are interested in talking about your Riak usage, get in touch with mark@basho.com and we’ll get the word out.

Enjoy.

Mark

There weren’t too many questions asked at the end of the presentation so we decided to cut them out of the recording in the interest of time. Apologies for this. Here they are:

  • What local storage backend are you using? Bitcask.
  • How many keys are you currently storing? Around 5 million.
  • What is the average value size? Under 10K
  • Can you share your hardware specs? Bare-metal, standard servers: 8-core, 16GB RAM, SATA drives.

Riak and Scala at Yammer from Basho Technologies on Vimeo.

Follow Up To Riak and Node.js Webinar

March 18, 2011

Thanks to all who attended Wednesday’s webinar on Riak (Search) and Node.js. If you couldn’t make it you can find a screencast of the webinar below. You can also check out the slides directly.

We hope we could give you a good idea what you can use the winning combination of Riak and Node.js for, by showing you our little syslog-emulating sample application, Riaktant. We made the source code available, so if you feel like running your own syslog replacement, go right ahead and let us know how things go. Of course you can just dig into the code and see how nicely Node.js and Riak play together too.

If you want to get some practical ideas how we utilized Riak’s MapReduce to analyze the log data, have a look at the functions used by the web interface. You can throw these right into the Node.js console and try them out yourself, since riak-js, the Node.js client for Riak, accepts JavaScript functions, so you don’t have to serialize them into a string yourself.

Thanks to Joyent for providing us with SmartMachines running Riak, and for offering No.de, their great hosting service for Node.js applications, where we deployed our little app with great ease.

Sean and Mathias

Free Webinar – Riak with Node.js – March 15 @ 2PM Eastern

March 8, 2011

JavaScript is the lingua franca of the web, and many developers are starting to use node.js to power their server-side applications. Riak is a flexible, scalable database that has a JavaScript-friendly interface, including MapReduce in JavaScript and an awesome client library called riak-js. Put the two together and you have lots of possibilities!

We invite you to join us for a free webinar on Tuesday, March 15 at 2:00PM Eastern Time (UTC-4) to talk about Riak with node.js. In this webinar, we’ll discuss:

  • Getting riak-js, the Riak client for node.js, into your application
  • Storing, retrieving, manipulating key-value data
  • Issuing MapReduce queries
  • Finding data with Riak Search
  • Testing your code with the TestServer

We’ll address the above topics in addition to looking at a sample application. The presentation will last 30 to 45 minutes, with time for questions at the end. Fill in the form below if you want to get started building node.js applications on top of Riak!

KillDashNine March Happening on Wednesday

March 5, 2011

In February we kicked off the KillDashNine drinkup. It was a huge success (turns out we aren’t the only ones who care about durability) and, as promised, we’ll be having another drinkup this month. On Wednesday, 3/9, we will be clinking glasses and sharing data loss horror stories at Bloodhound, located at 1145 Folsom Street here in San Francisco.

This month’s chosen cocktail is the *Data Eraser*, and it’s simple to make: 2 oz Vodka, 2 Oz Coffee Liqueur, 2 oz Tonic, and a dash of bitter frustration, anguish, and confusion (which is more or less how one feels when their data just disappears). And if you can’t make it, be sure to pour yourself a Data Eraser on 3/9 to take part in the festivities from wherever you happen to find yourself (or you can run your own local KillDashNine like Marten Gustafson did in Stockholm last month.

Registration details for the event are here, so be sure to RSVP if you’re planning to join us. In the mean time, spin up a few nodes of your favorite database and try your hand at terminating some processes with the help of our favorite command: _kill-9_.

Long Live Durability!

Basho

 

Announcing KevBurnsJr as a PHP Client Committer

February 28, 2011

We just added Kevin Burns, who goes by KevBurnsJr on Github, Twitter and the #riak IRC room on irc.freenode.net, as a committer to the Basho-supported Riak PHP Client.

Kevin has been hard at work over the past few weeks adding some great functionality to the PHP client and has even kicked off porting Ripple, Basho’s Ruby Client ODM, to PHP. Suffice it to say that we at Basho are excited about Kevin’s participation and involvement with Riak and our PHP code.

Some relevant code:

* Riak’s PHP Client
* Port of Ripple to PHP

Thank, Kev! We are looking forward to your contributions.

Mark