July 14, 2011
Hi. My name is Russell Brown and since March, I’ve been working on the Riak Java Client (making me the lone Java developer in an Erlang shop). This past week I merged a large, backwards-compatible branch with some enhancements and long-awaited fixes and refinements. In this post I want to introduce the changes I’ve made and the motivations behind them. At Basho we firmly believe that Riak’s Java interface is on track to be the among the best there is for Java developers who need a rock solid, production-ready database, so it’s time you get to know it if you don’t already.
First, Some History
When Riak was first released, it was only equipped with an HTTP API, so it followed that the Java client was a REST client. Later a Protocol Buffers Interface was added to Riak and Kresten Krab-Thorup and the team at Trifork contributed a Protocol Buffer’s interface for the Java library. Later still, around version 0.14, the Trifork PB Client was merged into the official Basho Riak Java Client. With this added interface, however, came a problem: both clients work well but they don’t share any interfaces or types. I started working for Basho in March 2011, my first task was to fix any issues with the existing clients and refactor them to a common, idiomatic interface. Some way into that task I was exposed to the rather brilliant Riak and Scala at Yammer talk given by Coda Hale and Ryan Kennedy at a Riak Meetup in San Francisco. This opened my eyes, and I’m very thankful to Coda and Ryan for sharing their expert understandings so freely. If you meet either of these two gentlemen, I urge you to buy them drinks.
A Common Interface
Having a common interface should be a no-brainer. Developers shouldn’t have to chose upfront about a low-level transport and then have all their subsequent code shaped by that choice. To that end, I added a RawClient interface to the library that describes the set of operations you can perform with Riak. I also adapted each of the original clients to this interface. If all you want to do is pump data in, or pull raw data out of Riak, the PB RawClient adapter is for you. There are some figures on the Riak Wiki that show it’s quite snappy. If you need to write a non-blocking client, or simply have to use the Jetty HTTP library, implementing this interface is the way to go.
There is some bad news here: I had to deprecate a lot of the original client and move that code to new packages. This will look a tad ugly in your IDE for a release or two, but it is better to make the changes than be stuck with odd packages for ever. There will be a code cull of the deprecated classes before the client goes v1.0.
The next task on the list for this raw package is to move the interfaces into a separate core project/package to avoid any circular dependency issues that will arise if you create your own RawClient implementation.The RawClient solves the common/idiomatic interface problem, but it doesn’t solve the main new challenge that an eventually consistent, fault-tolerant datastore brings to the client: siblings.
Before we move on, if you have the time please take a moment to read the excellent Vector Clocks page on the Riak wiki (but make sure you come back). Thanks to Vector Clocks Riak does all that it can to save you from dealing with conflicting values, but this doesn’t guarantee they won’t occur. The RawClient presents you with a Vector Clock and an array of sibling values, and you need to create a single, correct value to work with (and even write back to Riak as the one true value.) The new, higher-level client API in the Java Client makes this easier.
Conflict resolution is going to depend on your domain. Your data is opaque to Riak, which is why conflict resolution is a read time problem for the client. The canonical example (from the Dynamo Paper) is a shopping cart. If you have sibling shopping carts you can merge them (with a set-union operation, for example) to get a single cart with the values from all carts present. (Yes, you can re-instate a removed item, but that is far better than losing items. Ask Amazon.) Keep the idea of a shopping cart fresh in your mind for the remainder of this post as it figures in some of the examples I’ve used.
A Few Words On Domain Conversion
You use a Bucket to get key/values pairs from Riak.
Bucket b = client.createBucket(bucketName) .nVal(1) .allowSiblings(true) .execute(); IRiakObject fetched = b.fetch("k").execute(); b.store("k", "my new value").execute(); b.delete("k").execute();
The Bucket is a factory for RiakOperations, and a Riak Operation is a fluent builder that, when executed, calls out out to Riak. “Fetch” and “Store” Riak Operations accept a Converter and ConflictResolution implementation from you so that the data Riak returns can be deserialised into a domain object and any siblings can be resolved. The library provides a Jackson-based JSONConverter that will convert the JSON payload of a Riak data item into an instance of some domain class; think of it as a bit like an ORM (but maybe without the “R”).
final Bucket carts = client.createBucket(bucketName).allowSiblings(true).execute(); final ShoppingCart cart = new ShoppingCart(userId); cart.addItem("fixie"); cart.addItem("moleskine"); carts.store(cart).returnBody(false).retrier(DefaultRetrier.attempts(2)).execute();
Adding your own converters is trivial and I plan to provide a Jackson XML based one soon. Look at this test for a complete example.
Once the data is marshalled into domain instances, your logic is run to resolve any conflicts. A trivial shopping cart example is provided in the tests here. The ConflictResolver interface has a single method that takes an array of domain instances and returns a single, resolved value.
T resolve(final Collection<T> siblings) throws UnresolvedConflictException;
It throws the checked UnresolvedConflictException if you need to bail out. Your code can catch this and make the siblings available to a user (for example) for resolution as a last resort. I am considering making this a runtime exception, and would like to hear what you think about that.
To talk about mutation I’m going to stick with the shopping cart example. Imagine you’re creating a new cart for a visiting shopper. You create a ShoppingCart instance, add the toaster add the flambe set, and persist it. Meanwhile a network partition occurred and your user already added a steak knife set to a different cart. You’re not really creating a new value, but you weren’t to know. If you save this value you have a conflict to be resolved at a later date. Instead, the high level client executes a store operation as a fetch, convert, resolve siblings, apply a mutation and then store. In the case of the shopping cart that mutation would again be to merge the values of your new ShoppingCart with the resolved value fetched from Riak.
You provide an implementation of Mutation to any store operation. You never really know if you are creating a new value or updating an old one, so it is safer to model your write as a mutation to an existing value that results in a new value. This can be as simple as incrementing a number or adding the items in your Cart to the fetched Cart.
By default the library provides a ClobberMutator (it ignores the old value and overwrites it with a new one) but this is simply a default behaviour and not the best in most situations. It is better to provide your own Mutation implementation on a store operation. If you can model your values as logically monotonic or as transformations to existing values, then creating mutation implementations is a lot simpler.
As your project matures, you will firm up your ConflictResolvers, Mutations, and Converters into concrete classes, and at this point adding them for each operation is a lot more typing and code noise than you need (especially if you were using anonymous classes for your Mutation/ConflictResolver/Converter).
bucket.store(o) .withConverter(converter) .withMutator(mutation) .withResolver(resolver) .r(r) .w(w) .dw(dw) .retrier(retrier) .returnBody(false) .execute();
The library provides the DomainBucket class as a wrapper around the Bucket. DomainBuckets are constructed with a ConflictResolver, Mutation, and Converter and thereafter use those implementations for each operation. DomainBuckets are a convenient way to get a strongly typed view of a Bucket and only store/fetch values of that type. They are a touch of sugar that reduce noise and I advise you use them once your domain is established. This test illustrates the usage.
The Next Steps
That’s about it. There is a Retrier interface and a default try-3-times-with-a-short-wait implementation (if the database is fault-tolerant,the client should be too, right?) but I’m going to push that down the stack to the RawClient layer so we can add cluster awareness to the client (with load balancing and all that good stuff).
I haven’t covered querying (MapReduce and Link Walking) but I plan to in the next post (“Why Map/Reduce is easy with Java”, maybe?). I can say that is one aspect that has hardly changed from the original client. The original versions used a fluent builder and so does this client. The main difference is the common API and the ability to convert M/R results into Java Collections or domain specific objects (again, thanks to Jackson). Please read the README on the repo for details and the integration tests for examples.
At the moment the code is in the master branch on GitHub. If you get the chance to work with it I’d love to hear your feedback. The Riak Mailing List is the best place to make your feelings and opinions known. There are a few wrinkles to iron out before the next release of the Java Client, and your input will shape the future direction of this code so please, don’t be shy. We are on the lookout for contributors…
And go download Riak if you haven’t already.
June 27, 2011
This was originally posted on themarkphillips.com. Please use the original post for all comments.
When someone asks me, “Where is Basho located?”, I usually respond with something along the lines of: “Much like Riak, we are completely distributed.” Some three years ago our team was all working out of Cambridge, MA (which is still our headquarters). Slowly but surely the team grew in size, but it quickly became apparent that requiring all employees to work in the same geographic location would result in us missing out on some talented and downright brilliant people. So we resolved to “hire where the talent is.”
As it stands right now we have physical offices in Cambridge, MA and San Francisco. The team, however, is now completely distributed; in addition to Cambridge and San Francisco (and several other CA cities), we have people in Oregon, Oklahoma (various locations), Florida, Colorado (various locations), New Jersey, North Carolina, Minnesota, Virginia (various locations), Maryland (various locations), Idaho, New York, Germany, and the UK. The latest tally put our entire team at just over thirty people.
Hiring where the talent is means we don’t sacrifice great hires for location, but it also presents various hurdles when attempting to build culture and community. Anyone who works at a startup or as part of a small team can speak to the importance of culture. It’s crucial that distributed employees feel as though they are part of a tight-knit crew. If you show up every day and your engagement with your coworkers doesn’t go much beyond a few passing phrases in a chat client, you should be doing more. The leadership at Basho made it clear many moons ago that we were going to work hard to build culture and community. Just because you’re committing code 1000 miles from your nearest colleague doesn’t mean you need to feel like they are 1000 miles away.
I spend most of my time pursuing ways to strengthen and extend the various external communities that are growing out Basho’s open source software, but I thought it might be useful to examine what we do internally to build community and culture. As should be apparent, we’re not doing anything too crazy or innovative with the ways we connect and collaborate across states and countries. But it’s the little things that matter when culture is concerned at a distributed company, and I think we do a lot of the little things well.
For as long as I can remember, Basho has used Jabber for real-time chat collaboration. This is where we spend most of our time conversing, and the entire company idles in one room we call “bashochat.” At any given time you can find any number of conversations happening concurrently; several developers might be chasing down a finicky bug while several others are discussing the merits of the latest cat meme. Hundreds (if not thousands) of messages fly through here daily. At times it can get a bit distracting, so signing off to focus is encouraged and done often. We also just started logging bashochat to make sure that those who are out for the day or signed off to chase a corner case can stay in the loop.
In addition to Jabber, the Client Services Team also uses Campfire as their chat software of choice (for various reasons). There’s certainly no reason why multiple chat programs can’t co-exist under the same corporate umbrella. Basho is flexible, and if it works for your team, go with it.
Interacting via Skype serves as a great compliment to what happens in Jabber (even if Skype itself offers less than five nines of uptime). Everyone uses Skype at least once daily for our morning status call. We are still small enough where getting the majority of the company on the phone for a 10 minute status call is feasible, so we do it. Topics range from “What’s the current status of bug fix X for customer Y?” to “Did you get any questions at yesterday’s meetup talk that you couldn’t answer?” Video chats are also invaluable, and jumping on Skype to speak “face-to-face” for even five minutes is incredibly worthwhile and serves to reinforce the team feel (especially when a new hire is coming aboard).
Yammer is a great piece of software, and it recently worked its way into our suite of collaboration tools. When it was first introduced to our team (around the beginning of this year) I was a bit skeptical of how well it was going to work for us. We already use Jabber quite heavily. How would the two co-exist? Since then Yammer has become the home for low-volume, high quality messages that deserve more than just a passing glance or ephemeral response. In other words, the signal to noise ratio in Yammer is much higher; links to blog posts about Riak (or our competition), results of a long running benchmark (complete with graphs), or links to a new GitHub repo are all typical of what appears on Yammer. That said, the message volume has been growing steadily over the past months, and I’m curious and interested to see how this tool evolves for us.
At some point you have to actually meet and physically interact with your colleagues. To this end, we’ve been doing quarterly developer meetups for about six quarters now. These are 3-5 day gatherings of the entire team where it’s business as usual, with the exception of some team building activities scattered throughout the week. Lots of amazing ideas and and moments are born at these meetups, and we all look forward to them.
Basho is firing on all cylinders right now (fixing more bugs, writing more features, closing more deals, resolving more tickets, etc.), and I believe that our dedication to building a distributed culture and community internally has had a lot to do with it. Though Basho’s “system” is still a work in progress, in my opinion we’ve managed to build a strong internal community and culture that lends itself to heightened levels of productivity and overall happiness. We are still relatively small (right around 30, as I stated above) and making this scale will surely be a challenge. And I’m sure that the tools we use will change, too, to accommodate our needs (speaking of which, where is the Skype replacement already?).
You can’t force community and culture. It starts with how you hire and is tested every day (whether you’re working in the same physical location or not). Build (or seek out) a team that is committed to making something special across the board. Collaboration tools and processes will follow according, and they should compliment and enhance the way you work, not dictate it.
June 10, 2011
More awesome news coming out of BashoHQ: Joseph Blomstedt has joined the Basho Team! (Well, to be clear, he’s been working on Riak full-time for about three weeks now; this blog post is a bit overdue.)
Joe initially caught our eye after releasing riak_zab, an Erlang port of the Zookeeper atomic broadcast protocol that was designed to integrate with riak_core. Joe built riak_zab in order to support a strong consistency layer on top of Riak, allowing a single Riak cluster to be used both for eventually consistent and strongly consistent operations. We got in touch with him soon after this was released, and the rest is history.
Joe is currently finishing up a PhD at the University of Colorado with a focus on compilers, parallel code scheduling, and heterogeneous CPU/GPU systems. During his time in graduate school, he has also been a frequent intern at Intel — twice in the research division, and twice on product teams.
Outside of making Riak better by day, Joe also has a strong interest in expanding riak_core both in capability and popularity. In particular, he intends to work towards increasing the visibility of riak_core/Erlang in the academic community, where most distributed systems research builds on Hadoop/Java. He is also of the belief that there is considerable research still to be done in the area of eventually consistent distributed systems, and that Basho has a role to play in producing novel research.
Joe currently resides in Boulder while he finishes up his PhD, and takes turns working from home and the university campus. This fall he plans to move back to Seattle, where he previously lived during his undergraduate years (UW CSE, 2005). In the meantime, Joe is enjoying his remaining months with Boulder’s great food, beer, and open spaces. If you’re in the area, feel free to contact Joe if you want to talk Riak over lunch or beers.
May 26, 2011
Eric has been active in the Riak community for some time now, and, in addition to the numerous patches and bug fixes he has contributed to the Riak Python client, he’s also gone out of his way to help educate new and existing users about all things Riak on the Mailing List and in #riak on Freenode.
Make sure to keep an eye on the Riak Wiki Repo for his commits.
April 15, 2011
We’ve been expanding at an impressive rate as of late (we’re trying to keep up with GitHub), and today we’re thrilled to announce that another amazing developer has joined the Basho Team. Join us in welcoming Jared Morrow!
Jared started his career working on autonomous aircraft for Northop Grumman, then moving to Qualcomm to work on various government products. His most recent position was at Schneider Electric, focusing first on embedded applications and then moving to developer tools. It was these tools where he first dabbled in Erlang and began deploying Riak internally.
And it’s on tools where he’ll be spending a lot of his time (at least initially) at Basho. Jared will be taking on the role of making every piece of software we release more stable and robust with a suite of build and release tools he’s working to deploy. (In other words, Riak, Webmachine, Riak Search, and everything else we develop is about to get even better.)
Jared lives with his family in Fort Collins, Colorado, and in the winter, after a big snowfall, you can find him snowboarding in Summit County. (Please don’t talk to him though, because as you know, “there are no friends on a powder day”.)
April 13, 2011
We are pleased to announce that Ryan Zezeski has joined the Basho Team!
Ryan has been coding since 14 and was hooked after writing his first program in Visual Basic 3.0. (It was an add-on for AOL that would automatically block a user that spammed the chat room.) He wrote his first line of Erlang in late June of 2010 on a plane ride to Mountain View, CA, and his fingers haven’t stopped since.
In the last six months Ryan has been hard at work putting various pieces of Basho software into production at AOL including Rebar, Riak, Riak Search, Riak Core, and Webmachine and sent us numerous high quality patches for Luwak and Riak in the process. (It was his work on Luwak that initially caught our eye.) He’s planning to spend as much time as possible with Riak Core and expose it to the greater public through his working blog “try try try.”
On a personal note, Ryan resides in downtown Baltimore. If you happen to see a guy in Federal Hill sporting a black T-shirt that says “Riak”, don’t be scared to say hello.
March 24, 2011
Adam Hunter is the newest Basho Developer Advocate!
Adam first got involved with Riak when he used it in conjunction with Ripple, Riak’s Ruby library, to build several applications at his previous position. (In addition to being a deeply-skilled Ruby developer, Adam has also spent some happy years writing PHP for fun and profit.) In the process, he started contributing patches and features to Ripple, and we liked his code and enthusiasm for the project so much that we extended him committer rights. Since then he has become an active and visible member of the Riak community, so we were quite pleased when he accepted the offer to come aboard.
Home base for Adam is Charlotte, North Carolina, so be sure to look him up if you’re in the area and are interested in getting an earful about Riak and distributed systems. You can also find him on Twitter and on GitHub as adamhunter.
March 5, 2011
In February we kicked off the KillDashNine drinkup. It was a huge success (turns out we aren’t the only ones who care about durability) and, as promised, we’ll be having another drinkup this month. On Wednesday, 3/9, we will be clinking glasses and sharing data loss horror stories at Bloodhound, located at 1145 Folsom Street here in San Francisco.
This month’s chosen cocktail is the *Data Eraser*, and it’s simple to make: 2 oz Vodka, 2 Oz Coffee Liqueur, 2 oz Tonic, and a dash of bitter frustration, anguish, and confusion (which is more or less how one feels when their data just disappears). And if you can’t make it, be sure to pour yourself a Data Eraser on 3/9 to take part in the festivities from wherever you happen to find yourself (or you can run your own local KillDashNine like Marten Gustafson did in Stockholm last month.
Registration details for the event are here, so be sure to RSVP if you’re planning to join us. In the mean time, spin up a few nodes of your favorite database and try your hand at terminating some processes with the help of our favorite command: _kill-9_.
Long Live Durability!
March 4, 2011
Anyone can contribute to the Riak Wiki: it’s maintained and deployed from a public GitHub repository, so everyone is free to fork and send us a pull request to make changes. There is, however, a group of community members who are given commit access to this repo, and I’m pleased to announce that Ryan Zezeski is now part of this group.
Ryan first became involved with Riak several months ago when he selected it as the production data store for a component of the ad-serving platform he works on during the daytime hours. Since then he has become an active and visible member of our community, contributing numerous patches to Luwak and providing guidance to new and existing users on the Riak Mailing list and in the Riak IRC Channel. In short, he knows his Riak and we are thrilled to have him on board as a Community Committer.
Welcome, Ryan! We are looking forward to your contributions.
March 2, 2011
We are absolutely thrilled to announce that Mathias Meyer, known to some of you as Roidrage, has joined the team here at Basho as a Developer Advocate.
Mathias has dabbled with databases of many sorts over the years, and spent the last two years automating the heck out of cloud infrastructure at Scalarium, a company he co-founded where he will continue to play an advisory role. Along the way he developed a certain fascination towards distributed databases and a secret crush on Riak.
His spare time is currently devoted to writing the NoSQL Handbook:, a project into which he is pouring his brains, soul, and an abundance of coffee. (On a related note, he has also agreed to take on the role of Coffee Advocate at Basho. Expect a webcast real soon.)
Mathias is based in Berlin and, as such, you can expect to see a lot of him at various events and conferences across Europe flying the Basho flag. His first stateside appearance as a member of the Basho team will be at JSConf, where he will be serving as the official conference photographer. (Basho also happens to be sponsoring both JSConf and NodeConf, by the way.)