December 21, 2011
The inaugural BashoChats was held just under a week ago at BashoWest in San Francisco. About 30 local developers came out to have a few beers on Basho’s tab and discuss distributed systems and databases. If you’re local to the Bay Area and/or want to keep an eye on what we have planned, join the group. There are some great talks in the pipeline…
Most importantly I’m happy to report that both talks from the evening are now online for your viewing pleasure.
Enjoy. Hope to see you next month.
DTrace and the Erlang VM
Andy Gross opened up the evening with just under 30 minutes on the current work happening at Basho and a few other companies to bring DTrace to Erlang VM. He starts off with some general information on both components and then goes in-depth on how they can be used to profile a running Riak installation.
Repo here on GitHub with the code he used for the examples in his presentation.
Computing Reach Using Storm Distributed RPC
After Andy concluded, Nathan Marz gave an overview of Storm, a framework he and his team at BackType built for distributed and fault tolerant realtime computation. He takes us through some Storm basics and then demonstrates how it is used to compute reach using distributed RPC.
November 17, 2011
The Riak 1.0 Release Party happened just over a week ago in San Francisco. It was an exceptional evening, and we were able to bring together the Basho Team and a huge number of local Riak community members to celebrate the release.
In addition to the excellent food, drinks, company, and conversation, we had two great talks. The first was delivered by Basho’s CTO Justin Sheehy and he did about 20 minutes on the origins of Basho and Riak, and precisely how and why we got to Riak 1.0. After Justin concluded, Dave “Dizzy” Smith, Basho’s Director of Engineering, closed things up with some passionate words about where Riak and Basho are going and why he’s excited to be a part of it.
Most importantly, if you weren’t able to attend, we recorded the talks so no one would miss out on the action. They are well worth the 30 minutes and at the end of it you can call yourself a “Riak Historian”. You can find the video below. We also took some photos of the event. Those are below, too.
Enjoy, and thanks for being a part of Riak.
September 30, 2011
We are absolutely thrilled to report that as of today, Riak 1.0 is officially released and ready for your production applications!
Riak 1.0 packages are available. Go download one. And then go read the release notes because they are extensive and full of useful information highlighting the great work Basho has done since the last release.
There is already a lot of literature out there on the release, so here are the essentials to get you started.
The High Level Awesome
For those of you who need a refresher on the release, this 1.0 Slide Deck will give you a quick overview of why you should be excited about it. The big-ticket features are as follows:
In 1.0 we added the ability to build secondary indexes on your data stored in Riak. We developed this functionality because, quite frankly, people needed a more powerful way to query their data.
- High-level Slide Deck
Official Documentation for Secondary Indexes
Riak Pipe And Revamped MapReduce
Riak’s MapReduce functionality isn’t anything new, but we did a lot of work in this release to make the system more robust, performant, and resistant to failures. Riak Pipe is the new underlying processing layer that powers MapReduce, and you’ll be seeing a lot of cool features and functionality made possible as a result of it in the near future.
- Riak Pipe Code on GitHub (complete with a beautiful README)
- MapReduce Documentation on the Riak Wiki
Usability is a huge focus for us right now, and logging is something that’s less-than-simple to understand in Erlang applications. To that end, we wrote a new logging framework for Erlang/OTP called Lager that is shipping with 1.0 and drastically reduces the headaches traditionally associated with Erlang logging and debugging.
Riak Search has been a supported Basho product for several releases now, but until 1.0 you were required to build it as a separate package. In 1.0 we’ve merged the search functionality into Riak proper. Enabling it is a simple one line change in a configuration file. Do this and you’ve got distributed, full text search capabilities on top of Riak.
Support for LevelDB
Riak provides for pluggable storage backends, and we are constantly trying to improve the options we offer to our users. Google released LevelDB some months back, and we started to investigate it as a possible addition to our suite of supported backends. After some rigorous testing, what we found is that LevelDB had some attractive functionality and performance characteristics compared to our existing offerings (mainly Innostore), and it will be shipping in 1.0. Bitcask is still the default storage engine, but LevelDB, aside from being an alternative for key/value storage, is being used as the backend behind the new Secondary Indexing functionality.
- Leveling The Field (from the Basho Blog)
One of the most powerful components of Riak is riak_core, the distributed systems framework that, among many others things, enables Riak to scale horizontally. Riak’s scalability and operational simplicity are of paramount importance to us, and we are constantly looking to make this code and system even better. With that in mind, we did some major work in 1.0 to improve upon our cluster membership system and are happy to report that it’s now more stable and scalable than ever.
And So Much More …
Riak 1.0 is a massive accomplishment, and the features and code listed above are just the beginning of what this release has to offer. Take some time to read the lengthy release notes and you’ll see what we mean.
These improvements are many months in the making, and the bug fixes, new features, and added functionality make Riak (in our humble opinion) the best open source database available today.
Thank You, Community!
We did our best to ensure that the community was as big a part of this release as possible, and there’s no way the code and features would be this rock-solid without your help. Thanks for your usage, support, testing, debugging, and help with spreading the word about Riak and 1.0.
And 1.0 is just the beginning. We’ll continue to refine and build Riak over the coming months, and we would love for you to be a part of it if you’re not already. Some ways to get involved:
Thanks for being a part of Riak!
Additions to Leading Open Source Distributed Database Extends Ability to Process and Analyze Information Stored in Riak
CAMBRIDGE, MA – September 27, 2011 – Basho Technologies, the provider of distributed database and storage management solutions, today announced the upcoming release of the much-anticipated Riak 1.0 open source database platform, as well as its Riak Enterprise commercially licensed offering. Riak 1.0 includes features and tools aimed at helping organizations solve the difficult problems involved with managing and interpreting data in a distributed fashion, such as in cloud-computing environments.
The release of Riak 1.0 follows a period of tremendous growth in the Riak open source community, with hundreds of organizations benefitting from Riak’s distributed database functionality in production environments. Additionally, Basho Technologies has seen accelerated demand for solutions based on Riak, seeing record results in terms of customer and revenue growth in the first three quarters of 2011. New customers and users in recent quarters include Bump Technologies, Clipboard.com, the Health System of Denmark, DotCloud, Formspring, Ideel, i-Velocity (marking Basho’s first customer engagement in India), Mezeo, SEOmoz, Social Genius, Swipely, Voxer, Yammer, and many more enterprises and agencies.
“We are excited about the new features in Riak 1.0, as they give us a step up in terms of taking advantage of the data we’re already capturing with Riak,” said Ty Amell, co-founder and CEO at Stackmob, a mobile applications platform provider. “New features like secondary indices will enable us to build smarter capabilities that add greater value to users of our existing mobile platform.”
“Riak 1.0 sets a new bar for managing data in a distributed environment,” said Don Rippert, president and CEO of Basho Technologies. “Riak has already proven its stability, ability to scale and provide absolute fault-tolerance in a highly distributed deployment; with 1.0, users of Riak can now more easily build and maintain powerful business applications on top of our platform.”
Features of Riak 1.0 Include:
- Secondary Indices – allows a developer to retrieve Riak objects using a simple query language that matches compound criteria against an object’s properties
- Riak Pipe – a new feature for higher-latency data processing; a new take on Map/Reduce style data processing
- Integration of Riak Search – the powerful search engine built for Riak is now tightly integrated with the core 1.0 package
- Lager – a new, simple and effective logging framework for Riak 1.0
- LevelDB Support – Riak 1.0 includes available support for the LevelDB storage engine, further increasing user choice in deploying Riak
- Administration Improvements – new tools make it easier to scale, manage and access a Riak cluster for developers and administrators
“Riak has already proven to be a great tool for capturing data, regardless of type, volume or environment, in a highly available and fault-tolerant manner,” said Eric Brewer, creator of the CAP Theorem, vice president of infrastructure at Google and member of Basho’s Board of Directors. “With the features in Riak 1.0, users will have not just a battle-proven database but better tools for analyzing and processing the data they capture and store with Riak.”
Riak 1.0 will be available later this month. To preview some of the new features, download Riak, or to inquire about a commercial deployment, please visit www.basho.com.
About Basho Technologies
Basho Technologies, Inc., founded in 2008 by a core group of software architects, engineers, and executive leadership from Akamai Technologies, Inc. (Nasdaq: AKAM – News), has offices in San Francisco, California, Cambridge, Massachusetts and Reston, Virginia. Basho’s flagship solution, Riak, is a distributed data store that combines extreme fault tolerance, rapid scalability, and ease of use to meet the needs of the rapidly expanding Big Data management and storage software market. Designed from the ground up to work with applications that run on the Internet and mobile networks, Riak is particularly well-suited for users of cloud infrastructure such as Amazon’s AWS and Joyent’s Smart platform and is available in both an open source and a paid commercial version. For more information about Basho or Riak visit www.basho.com.
September 9, 2011
Being a distributed company, we make a lot of videos at Basho that are intended for internal consumption and used to educate everyone on new features, functionality, etc. Every once and a while someone makes a video that’s so valuable it’s hard not to share it with the greater community. This is one of those.
This screencast is a bit on the long side, but it’s entirely worth it. Basho Software Engineer Joe Blomstedt put it together to educate all of Basho on the new cluster membership code, features, and functionality coming in the Riak 1.0 release (due out at the end of the month). We aim to make Riak as operationally-simple as possible to operate at scale, and the choices we make and code we write around cluster membership form the crux of this simplicity.
At the end of this you’ll have a better idea of what Riak’s cluster membership is all about, its major components, how it works in production, new commands that are present Riak 1.0, and much, much more.
And, if you want to dig deeper into what Riak and cluster membership is all about, start here:
It should be noted again that this was intended for internal consumption at Basho, so Joe’s tone and language reflect that in a few sections.
Enjoy, and thanks for being a part of Riak.
September 2, 2011
We are thrilled to announce that John Newman has joined the Basho Team. John comes on as on our newest Developer Advocate, and will be focusing on User Interface and Web Design for Basho’s various products and web properties.
A bit about John (in his words):
If you want to keep an eye on John, you can follow him on GitHub. Other than that, expect to see his design and UI/UX work inter-weaved throughout the suite of Basho and Riak web properties and software offerings.
July 14, 2011
Hi. My name is Russell Brown and since March, I’ve been working on the Riak Java Client (making me the lone Java developer in an Erlang shop). This past week I merged a large, backwards-compatible branch with some enhancements and long-awaited fixes and refinements. In this post I want to introduce the changes I’ve made and the motivations behind them. At Basho we firmly believe that Riak’s Java interface is on track to be the among the best there is for Java developers who need a rock solid, production-ready database, so it’s time you get to know it if you don’t already.
First, Some History
When Riak was first released, it was only equipped with an HTTP API, so it followed that the Java client was a REST client. Later a Protocol Buffers Interface was added to Riak and Kresten Krab-Thorup and the team at Trifork contributed a Protocol Buffer’s interface for the Java library. Later still, around version 0.14, the Trifork PB Client was merged into the official Basho Riak Java Client. With this added interface, however, came a problem: both clients work well but they don’t share any interfaces or types. I started working for Basho in March 2011, my first task was to fix any issues with the existing clients and refactor them to a common, idiomatic interface. Some way into that task I was exposed to the rather brilliant Riak and Scala at Yammer talk given by Coda Hale and Ryan Kennedy at a Riak Meetup in San Francisco. This opened my eyes, and I’m very thankful to Coda and Ryan for sharing their expert understandings so freely. If you meet either of these two gentlemen, I urge you to buy them drinks.
A Common Interface
Having a common interface should be a no-brainer. Developers shouldn’t have to chose upfront about a low-level transport and then have all their subsequent code shaped by that choice. To that end, I added a RawClient interface to the library that describes the set of operations you can perform with Riak. I also adapted each of the original clients to this interface. If all you want to do is pump data in, or pull raw data out of Riak, the PB RawClient adapter is for you. There are some figures on the Riak Wiki that show it’s quite snappy. If you need to write a non-blocking client, or simply have to use the Jetty HTTP library, implementing this interface is the way to go.
There is some bad news here: I had to deprecate a lot of the original client and move that code to new packages. This will look a tad ugly in your IDE for a release or two, but it is better to make the changes than be stuck with odd packages for ever. There will be a code cull of the deprecated classes before the client goes v1.0.
The next task on the list for this raw package is to move the interfaces into a separate core project/package to avoid any circular dependency issues that will arise if you create your own RawClient implementation.The RawClient solves the common/idiomatic interface problem, but it doesn’t solve the main new challenge that an eventually consistent, fault-tolerant datastore brings to the client: siblings.
Before we move on, if you have the time please take a moment to read the excellent Vector Clocks page on the Riak wiki (but make sure you come back). Thanks to Vector Clocks Riak does all that it can to save you from dealing with conflicting values, but this doesn’t guarantee they won’t occur. The RawClient presents you with a Vector Clock and an array of sibling values, and you need to create a single, correct value to work with (and even write back to Riak as the one true value.) The new, higher-level client API in the Java Client makes this easier.
Conflict resolution is going to depend on your domain. Your data is opaque to Riak, which is why conflict resolution is a read time problem for the client. The canonical example (from the Dynamo Paper) is a shopping cart. If you have sibling shopping carts you can merge them (with a set-union operation, for example) to get a single cart with the values from all carts present. (Yes, you can re-instate a removed item, but that is far better than losing items. Ask Amazon.) Keep the idea of a shopping cart fresh in your mind for the remainder of this post as it figures in some of the examples I’ve used.
A Few Words On Domain Conversion
You use a Bucket to get key/values pairs from Riak.
Bucket b = client.createBucket(bucketName) .nVal(1) .allowSiblings(true) .execute(); IRiakObject fetched = b.fetch("k").execute(); b.store("k", "my new value").execute(); b.delete("k").execute();
The Bucket is a factory for RiakOperations, and a Riak Operation is a fluent builder that, when executed, calls out out to Riak. “Fetch” and “Store” Riak Operations accept a Converter and ConflictResolution implementation from you so that the data Riak returns can be deserialised into a domain object and any siblings can be resolved. The library provides a Jackson-based JSONConverter that will convert the JSON payload of a Riak data item into an instance of some domain class; think of it as a bit like an ORM (but maybe without the “R”).
final Bucket carts = client.createBucket(bucketName).allowSiblings(true).execute(); final ShoppingCart cart = new ShoppingCart(userId); cart.addItem("fixie"); cart.addItem("moleskine"); carts.store(cart).returnBody(false).retrier(DefaultRetrier.attempts(2)).execute();
Adding your own converters is trivial and I plan to provide a Jackson XML based one soon. Look at this test for a complete example.
Once the data is marshalled into domain instances, your logic is run to resolve any conflicts. A trivial shopping cart example is provided in the tests here. The ConflictResolver interface has a single method that takes an array of domain instances and returns a single, resolved value.
T resolve(final Collection<T> siblings) throws UnresolvedConflictException;
It throws the checked UnresolvedConflictException if you need to bail out. Your code can catch this and make the siblings available to a user (for example) for resolution as a last resort. I am considering making this a runtime exception, and would like to hear what you think about that.
To talk about mutation I’m going to stick with the shopping cart example. Imagine you’re creating a new cart for a visiting shopper. You create a ShoppingCart instance, add the toaster add the flambe set, and persist it. Meanwhile a network partition occurred and your user already added a steak knife set to a different cart. You’re not really creating a new value, but you weren’t to know. If you save this value you have a conflict to be resolved at a later date. Instead, the high level client executes a store operation as a fetch, convert, resolve siblings, apply a mutation and then store. In the case of the shopping cart that mutation would again be to merge the values of your new ShoppingCart with the resolved value fetched from Riak.
You provide an implementation of Mutation to any store operation. You never really know if you are creating a new value or updating an old one, so it is safer to model your write as a mutation to an existing value that results in a new value. This can be as simple as incrementing a number or adding the items in your Cart to the fetched Cart.
By default the library provides a ClobberMutator (it ignores the old value and overwrites it with a new one) but this is simply a default behaviour and not the best in most situations. It is better to provide your own Mutation implementation on a store operation. If you can model your values as logically monotonic or as transformations to existing values, then creating mutation implementations is a lot simpler.
As your project matures, you will firm up your ConflictResolvers, Mutations, and Converters into concrete classes, and at this point adding them for each operation is a lot more typing and code noise than you need (especially if you were using anonymous classes for your Mutation/ConflictResolver/Converter).
bucket.store(o) .withConverter(converter) .withMutator(mutation) .withResolver(resolver) .r(r) .w(w) .dw(dw) .retrier(retrier) .returnBody(false) .execute();
The library provides the DomainBucket class as a wrapper around the Bucket. DomainBuckets are constructed with a ConflictResolver, Mutation, and Converter and thereafter use those implementations for each operation. DomainBuckets are a convenient way to get a strongly typed view of a Bucket and only store/fetch values of that type. They are a touch of sugar that reduce noise and I advise you use them once your domain is established. This test illustrates the usage.
The Next Steps
That’s about it. There is a Retrier interface and a default try-3-times-with-a-short-wait implementation (if the database is fault-tolerant,the client should be too, right?) but I’m going to push that down the stack to the RawClient layer so we can add cluster awareness to the client (with load balancing and all that good stuff).
I haven’t covered querying (MapReduce and Link Walking) but I plan to in the next post (“Why Map/Reduce is easy with Java”, maybe?). I can say that is one aspect that has hardly changed from the original client. The original versions used a fluent builder and so does this client. The main difference is the common API and the ability to convert M/R results into Java Collections or domain specific objects (again, thanks to Jackson). Please read the README on the repo for details and the integration tests for examples.
At the moment the code is in the master branch on GitHub. If you get the chance to work with it I’d love to hear your feedback. The Riak Mailing List is the best place to make your feelings and opinions known. There are a few wrinkles to iron out before the next release of the Java Client, and your input will shape the future direction of this code so please, don’t be shy. We are on the lookout for contributors…
And go download Riak if you haven’t already.
June 27, 2011
This was originally posted on themarkphillips.com. Please use the original post for all comments.
When someone asks me, “Where is Basho located?”, I usually respond with something along the lines of: “Much like Riak, we are completely distributed.” Some three years ago our team was all working out of Cambridge, MA (which is still our headquarters). Slowly but surely the team grew in size, but it quickly became apparent that requiring all employees to work in the same geographic location would result in us missing out on some talented and downright brilliant people. So we resolved to “hire where the talent is.”
As it stands right now we have physical offices in Cambridge, MA and San Francisco. The team, however, is now completely distributed; in addition to Cambridge and San Francisco (and several other CA cities), we have people in Oregon, Oklahoma (various locations), Florida, Colorado (various locations), New Jersey, North Carolina, Minnesota, Virginia (various locations), Maryland (various locations), Idaho, New York, Germany, and the UK. The latest tally put our entire team at just over thirty people.
Hiring where the talent is means we don’t sacrifice great hires for location, but it also presents various hurdles when attempting to build culture and community. Anyone who works at a startup or as part of a small team can speak to the importance of culture. It’s crucial that distributed employees feel as though they are part of a tight-knit crew. If you show up every day and your engagement with your coworkers doesn’t go much beyond a few passing phrases in a chat client, you should be doing more. The leadership at Basho made it clear many moons ago that we were going to work hard to build culture and community. Just because you’re committing code 1000 miles from your nearest colleague doesn’t mean you need to feel like they are 1000 miles away.
I spend most of my time pursuing ways to strengthen and extend the various external communities that are growing out Basho’s open source software, but I thought it might be useful to examine what we do internally to build community and culture. As should be apparent, we’re not doing anything too crazy or innovative with the ways we connect and collaborate across states and countries. But it’s the little things that matter when culture is concerned at a distributed company, and I think we do a lot of the little things well.
For as long as I can remember, Basho has used Jabber for real-time chat collaboration. This is where we spend most of our time conversing, and the entire company idles in one room we call “bashochat.” At any given time you can find any number of conversations happening concurrently; several developers might be chasing down a finicky bug while several others are discussing the merits of the latest cat meme. Hundreds (if not thousands) of messages fly through here daily. At times it can get a bit distracting, so signing off to focus is encouraged and done often. We also just started logging bashochat to make sure that those who are out for the day or signed off to chase a corner case can stay in the loop.
In addition to Jabber, the Client Services Team also uses Campfire as their chat software of choice (for various reasons). There’s certainly no reason why multiple chat programs can’t co-exist under the same corporate umbrella. Basho is flexible, and if it works for your team, go with it.
Interacting via Skype serves as a great compliment to what happens in Jabber (even if Skype itself offers less than five nines of uptime). Everyone uses Skype at least once daily for our morning status call. We are still small enough where getting the majority of the company on the phone for a 10 minute status call is feasible, so we do it. Topics range from “What’s the current status of bug fix X for customer Y?” to “Did you get any questions at yesterday’s meetup talk that you couldn’t answer?” Video chats are also invaluable, and jumping on Skype to speak “face-to-face” for even five minutes is incredibly worthwhile and serves to reinforce the team feel (especially when a new hire is coming aboard).
Yammer is a great piece of software, and it recently worked its way into our suite of collaboration tools. When it was first introduced to our team (around the beginning of this year) I was a bit skeptical of how well it was going to work for us. We already use Jabber quite heavily. How would the two co-exist? Since then Yammer has become the home for low-volume, high quality messages that deserve more than just a passing glance or ephemeral response. In other words, the signal to noise ratio in Yammer is much higher; links to blog posts about Riak (or our competition), results of a long running benchmark (complete with graphs), or links to a new GitHub repo are all typical of what appears on Yammer. That said, the message volume has been growing steadily over the past months, and I’m curious and interested to see how this tool evolves for us.
At some point you have to actually meet and physically interact with your colleagues. To this end, we’ve been doing quarterly developer meetups for about six quarters now. These are 3-5 day gatherings of the entire team where it’s business as usual, with the exception of some team building activities scattered throughout the week. Lots of amazing ideas and and moments are born at these meetups, and we all look forward to them.
Basho is firing on all cylinders right now (fixing more bugs, writing more features, closing more deals, resolving more tickets, etc.), and I believe that our dedication to building a distributed culture and community internally has had a lot to do with it. Though Basho’s “system” is still a work in progress, in my opinion we’ve managed to build a strong internal community and culture that lends itself to heightened levels of productivity and overall happiness. We are still relatively small (right around 30, as I stated above) and making this scale will surely be a challenge. And I’m sure that the tools we use will change, too, to accommodate our needs (speaking of which, where is the Skype replacement already?).
You can’t force community and culture. It starts with how you hire and is tested every day (whether you’re working in the same physical location or not). Build (or seek out) a team that is committed to making something special across the board. Collaboration tools and processes will follow according, and they should compliment and enhance the way you work, not dictate it.
June 10, 2011
More awesome news coming out of BashoHQ: Joseph Blomstedt has joined the Basho Team! (Well, to be clear, he’s been working on Riak full-time for about three weeks now; this blog post is a bit overdue.)
Joe initially caught our eye after releasing riak_zab, an Erlang port of the Zookeeper atomic broadcast protocol that was designed to integrate with riak_core. Joe built riak_zab in order to support a strong consistency layer on top of Riak, allowing a single Riak cluster to be used both for eventually consistent and strongly consistent operations. We got in touch with him soon after this was released, and the rest is history.
Joe is currently finishing up a PhD at the University of Colorado with a focus on compilers, parallel code scheduling, and heterogeneous CPU/GPU systems. During his time in graduate school, he has also been a frequent intern at Intel — twice in the research division, and twice on product teams.
Outside of making Riak better by day, Joe also has a strong interest in expanding riak_core both in capability and popularity. In particular, he intends to work towards increasing the visibility of riak_core/Erlang in the academic community, where most distributed systems research builds on Hadoop/Java. He is also of the belief that there is considerable research still to be done in the area of eventually consistent distributed systems, and that Basho has a role to play in producing novel research.
Joe currently resides in Boulder while he finishes up his PhD, and takes turns working from home and the university campus. This fall he plans to move back to Seattle, where he previously lived during his undergraduate years (UW CSE, 2005). In the meantime, Joe is enjoying his remaining months with Boulder’s great food, beer, and open spaces. If you’re in the area, feel free to contact Joe if you want to talk Riak over lunch or beers.
May 26, 2011
Eric has been active in the Riak community for some time now, and, in addition to the numerous patches and bug fixes he has contributed to the Riak Python client, he’s also gone out of his way to help educate new and existing users about all things Riak on the Mailing List and in #riak on Freenode.
Make sure to keep an eye on the Riak Wiki Repo for his commits.