November 11, 2010
Things are moving incredibly fast in the NoSQL space. I am used to internet-fast — helping bring on 300 customers in a year at Akamai; going from adult bulletin boards and leased lines to hosting sites for twenty percent of the Fortune 500 at Digex (Verizon Business) in eighteen months. I have never seen a space explode like the NoSQL space.
Two weeks ago, Justin Sheehy stood on stage delivering a rousing and thoughtful presentation to the NoSQL East Conference that was less about Riak and more about a definition of first principles that underpinned Riak: what it REALLY means when you claim such terms as scalability (it doesn’t mean buying a bigger machine for your master DB) and fault-tolerance (it has to apply to writes and reads and is binary; you either always accept writes and serve reads or you don’t). The conference was a bit of a coming out party for Basho, which co-sponsored the event with Rackspace, Georgia Tech, and a host of other companies. We had been working on Riak for 18 months or so in relative quiet and it was nice to finally see what people thought, first hand.
There were equally interesting presentations about Pig and MongoDB and a host of other NoSQL entrants, all of which will make for engrossing viewing when they finally get posted. We were told this wasn’t quite as exciting as the NoSQL conference out West but none of us seemed to mind. Home Depot, Turner Broadcasting, Weather.com, and Comcast had all sent folks down to evaluate the technology for real, live problems and the enthusiasm in the auditorium spilled out into the Atlanta bars. Business cards were exchanged, calls set up, even a little business discussed. Clearly, NoSQL databases were maturing fast.
No sooner had we returned to Cambridge than news of Flybridge’s investment in 10Gen came out. Hooray! Someone was willing to bet a $3.4 million dollars on a company in the space. Chip Hazard, ever affable, wrote a nice blog post explaining the investment. According to him, every developer they talked to had downloaded some NoSQL database to test. Brilliant news. He said Flybridge invested in 10Gen because they liked the space and knew the team from their investment in Doubleclick, from whose loins the management team at 10Gen issued. No more felicitous reason exists for a group of persons to invest $3.4 million than that previous investments in the same team were handsomely rewarded. I would wish Chip and 10Gen the best if I had time.
Because contemporaneous with the news of Flybridge’s investment, and almost as if the world had decided NoSQL’s time had come, we began to field emails and calls from interested parties. Trials, quotes, lengthy discussions about features and uses of Riak — the week was a blur. Everyone was conducting a bakeoff: “I have a 4TB database and customers in three continents. I am evaluating Riak and two other document datastores. Tell me about your OLAP features.”
Heady times and, frankly, of somewhat dubious promise, if you ask me. Potential clients that materialize so quickly always seem to disappear just as fast. Really embracing a new technology requires trials, tests, new features, and time. Time most off all. These “bluebirds” would fly away in no time, if my experience held true.
Except, this time it didn’t happen. Contracts were exchanged. Pen nibs were sharpened. It is as if the entire world decided to not wait for the everyone else to jump on the bandwagon and instead, decided to go NoSQL. Even using this last week as the sole example, I think the reason is plain — people have real pain and suddenly the word is out that they no longer have to suffer.
Devs are constrained by what they can build, rich features notwithstanding. Ask the company that had to choose between Riak and a $100K in-memory appliance to scale. And Ops is getting slaughtered — the cost of scaling poorly (and by poorly I mean pagers going off during dinner, bulk updates taking hours and failing all the time, fragmented and unmanageable indices consuming dozens of machines) is beginning to look like the cost of antiquated technology. Good Ops people are not fools. They look for ways to make life easier. Make no mistake — all the Devs and Ops folks came with a set of tough questions and a list of new features. They also came with an understanding that companies that release open source software still have a business to run. They are willing to spend on a real company. In fact, having a business behind Riak ended up mattering as much as any features.
So, I suspect, we are at the proverbial “end of the beginning.” Smart people in the NoSQL movement have succeeded in building convincingly good software and then explaining the virtues convincingly (all but one of the presentations at NoSQL East demonstrated the virtues of the respective approaches). Now these people are connecting to smart people responsible for building and running web apps, people who are decidedly unwilling to sit around hoping for Oracle or IBM to solve their problems.
In the new phase — which we will cleverly call the “beginning of the middle” — great tech will matter even more than it does now. It won’t be about selling or marketing or any of that. If our numbers are any indication of a larger trend, more people will download and install NoSQL databases in the next month than the combined total of the three months previous. More people in a buying frame of mind will evaluate NoSQL technology not in terms of its coolness but in terms of its ability to solve their real, often expensive problems. The next phase will be rigorous in a way this phase was not. People have created several entirely new ways to store and distribute data. That was the easy part.
Just as much as great tech, the people behind it will matter. That means more calls between us and Dev teams. That means more feature requests considered and, possibly, judiciously, agreed to.
That also means lots of questions answered. People care about support. They care about whether you answer their emails in a timely fashion and are polite. People want to do business with NoSQL. They want to spend money to solve problems. They need to know they are spending it with responsible, responsive, dedicated people.
Earl tweets about it all the time and I happen to agree: any NoSQL success helps all NoSQL players. I also happen to feel that any failure hurts all NoSQL players. As NoSQL rapidly ages into its adolescence, it will either be awkward and painful or exciting and characterized by incredible growth.
When I was a kid on the Navy base in Alameda, my babysitter watched soaps all afternoon, leaving me mostly to my own devices. If I stopped in, I always got roped in to hearing her explain her favorite stories. Most of all she loved how ridiculous they were, though she would never admit this exactly. Instead, adopting an attitude of gleeful incredulity, she would point out this or that attractive young actor and tell me how just a year ago, she was a little baby. “Soap people have to grow up quick, I guess,” was her single (and to her, completely satisfactory) explanation. “If they don’t, they get written out of the story.”
November 11, 2010
We announced recently on the Riak Mailing List that Basho was switching to git and GitHub for development of Riak and all other Basho software. As stated in that linked email, we did this primarily for reasons pertaining to community involvement in the development of Riak. The explanation on the Mailing List was a bit terse, so we wanted to share some more details to ensure we answered all the questions related to the switch.
Riak was initially used as the underlying data store for an application Basho was selling several years ago and, at that time, its development was exclusively internal. The team used Mercurial for internal projects, so that was the de-facto DVCS choice for the source.
When we open-sourced Riak in August 2009, being Mercurial users, we chose to use BitBucket as our canonical repository. At the time we open-sourced it, we were less concerned with community involvement in the development process than we are now. Our primary reason for open-sourcing Riak was to get it into the hands of more developers faster.
Not long after this happened, the questions about why we weren’t on GitHub started to roll in. Our response was that we were a Mercurial shop and BitBucket was a natural extension of that. Sometime towards the beginning of May we started maintaining an official mirror of our code on GitHub. This mirror was our way of acknowledging that there is more than one way to develop software collaboratively and that we weren’t ignoring the heaps of developers who were dedicated GitHub users and preferred to look at and work with code on this platform.
GitHub has the concept of “Watchers” (analogous to “Followers” on BitBucket). We started accumulating Watchers once this GitHub mirror was in place. “Watchers” is a useful, but not absolute, metric for measuring interest and activity in a project. They bring a tremendous amount of attention to any given project through their use of the code and their promotion of it. They also, in the best case scenario, will enhance the code in a meaningful way by finding bugs and contributing patches.
This table shows the week on week of growth of BitBucket Followers vs. GitHub Watchers since we put the official mirror in place:
|Number of Followers/Watchers at Time of Switch||97||145|
|Avg. Week on Week Growth (%)||0.74||7.2|
Since putting the official mirror in place, the number of Watchers on the GitHub repo for Riak has grown at steady ready, averaging just over 7% week on week. This far outpaced the less than 1% growth in Followers on the canonical Bitbucket repository for Riak.
With this information it was clear that Riak on GitHub as a mirror was bringing us more attention and driving more community growth than was our canonical repo on BitBucket. So, in the interest of community development, we decided that Riak needed to live on GitHub. What they have built is truly the most collaborative and simple-to-use development platform there is (at least one well-respected software analyst has even called it “the future of open source”). Though Mercurial was deeply ingrained in our development process, we were willing to tolerate the workflow hiccups that arose during the week or so it took to get used to git in exchange for the resulting increase in attention and community contributions.
The switch is already proving fruitful. In addition to the sharp influx in Watchers for Riak, we’ve already taken some excellent code contributions via GitHub. That said, there is much left to be written. And we would love for you to join us in building something legendary in Riak, whatever your distributed version control system and platform preference may be.
October 26, 2010
Basho is hosting one event this week and participating in another. Here are the details to make sure everyone is up to speed:
A NOSQL Evening in Palo Alto
Tonight there will be a special edition of the Silicon Valley NoSQL Meetup, billed as “A NOSQL Evening in Palo Alto.” Why do I say “special”? Because this month’s event has been organized by the one and only Tim Anglade as part of his NoSQL World Tour. And this is shaping up to be one of the tour’s banner events.
There are almost 200 people signed up to see this discussion as it’s sure to be action-packed and informative. If you’re in the area and can make it out on short notice, I would recommend you attend.
October San Francisco Riak Meetup
On Thursday night, from 7-9, we are holding the October installment of the San Francisco Riak Meetup. Like last month, the awesome team at Engine Yard has once again been gracious enough to offer us their space for the event.
We have two great planned talks for this month. The first will be Basho hacker Kevin Smith talking about a feature of Riak that he has had a major hand in writing: MapReduce. Kevin is planning to cover everything from design to new code demos to the road map. In short, this should be exceptional.
For the second half of Thurday’s meetup we are going to get more interactive than usual. Articulation of use cases and database applicability is still something largely unaddressed in our space. So we thought we would address it. We are inviting people to submit use cases in advance of the meetup with some specific information about their apps. The Basho Developers are going to do some work before the event analyzing the use cases and then, with some help from the crowd, determine if and how Riak will work for a given use case – and if Riak isn’t the right fit, we might even help you find one that is. If you are curious whether or not Riak is the right database for that Facebook-killer you’re planning to build, now is your chance to find out. We still have room for one or two more use cases, so even if you’re not going to be able to attend the Thursday’s meetup I want to hear from you. Follow the instructions on the meetup page linked above to submit a use case.
That said, if you are in the Bay Area on Thursday night and want to have some beer and pizza with a few developers who are passionate about Riak and distributed systems, RSVP for the event. You won’t be disappointed.
Hope to see you there!
October 20, 2010
Last week Basho released Riak 0.13, including (among many other great improvements) the first public release of the much-anticipated Riak Search. There are a number of reasons why I am very excited by this.
The first and most obvious reason why Riak Search is exciting is that it’s an excellent piece of software for solving a large class of data retrieval needs. Of course, this isn’t the first search system that is clustered and can grow by adding servers; that idea isn’t groundbreaking or very exciting on its own. However, it is the first such search system that I know of with the powerful and robust systems model that people have come to treasure in the Riak key/value store.
Being able to grow & shrink and handle failures in an easy and predictable way is much less common in search systems than in “simpler” data management systems. After we demonstrated that we could build something (anything!) with that kind of easy scalability and availability, our friends and customers began to ask if we could apply our ideas to some richer data models. Specifically, a common pattern began to emerge: people would deploy Riak and an indexing system such as Apache Solr system side-by-side. This was a workable solution, but could be operationally frustrating. The systems could get out of sync, capacity planning became more complicated, and most importantly the operations story around failure management became much more challenging. If only we could make the overall collection of systems as good at availability management as Riak was, then a new class of problems would be solved.
Those conversations began a journey of exploration and experimentation. This essential phase was led by John Muellerleile, one of the most creative and resourceful programmers I know. He did all of the early work, finding places where the state of the art in indexing and search could be merged with the “Riak Way” of doing things. More recently, amazing work was done by the entire Basho Engineering team to make Riak Search move from prototype to product.
Riak Search has only been out for about a week, but users are already discovering that they can use it in more than one way; it can function in fulltext-analysis, as an easy way to produce simple inverted indices over semi-structured data, and more.
That’s enough reason to be excited about Riak Search, but I have bigger reasons as well.
Riak Search is the first public demonstration that Riak Core is a meaningful base on which to build distributed systems beyond just a key/value store. By using the same central code base for distribution, dispatch, ownership, failure management, and node administration, we are able to confidently make many of the same guarantees for Search that we have made all along for Riak’s key/value storage. Even if Search itself wasn’t such a compelling product, it is exciting as a proof of the value of Riak Core.
That value hints at Riak’s future — not as a single database but as a family of distributed systems for storing, managing, and retrieving data. We’ve now gone from one to two such systems, but we’re not stopping there. The work of creating Search was really two efforts: building Search itself and also the breakout & improvement of our Core. We can (and will) use that improved Core to build new systems in the future.
The subtlest, but perhaps most important, of the exciting things about Search is that it also uses Core to show how each new Riak system is greater than the sum of its parts. Riak Search is not just a search system using the same Core codebase as KV, it is running on the same actual nodes as KV. This allows us to develop features that don’t make sense in KV alone or in Search alone, but that take advantage of the shared running elements of Core. For instance, users can issue a search/map/reduce query that runs map/reduce style parallel processing with data locality on a dataset determined by a search result. As we develop further systems on Riak Core, we expect further such connections to make each one also benefit the entire Riak family in this way.
What we have released in the recent past is exciting. The future is even more exciting.
October 10, 2010
Dedicated Riak users who were who hanging out in the IRC channel late Friday may have seen the following message from our bot that reports changes to the canonical Riak Repo on Bitbucket:
dizzyd Added tag riak-0.13.0 for changeset 957b3d601bde
That’s right! We tagged the 0.13 release of Riak this past Friday and, after of a weekend away from our laptops, are ready to announce it officially. As you’re about to learn, this was a banner release for a whole host of reasons. Here is a rundown of the noteworthy changes and enhancements in this release:
As most people know, we have been hard at work on Riak Search for several months and, after releasing it to a handful of users as part of a limited beta, we are finally ready to release it into the wild!
Riak Search is a distributed, easily-scalable, failure-tolerant, real-time, full-text search engine built around Riak Core and tightly integrated with Riak’s key/value layer.
At a very high level, Search works like this: when a bucket in Riak has been enabled for Search integration (by installing the Search pre-commit hook), any objects stored in that bucket are also indexed seamlessly in Riak Search. You can then find and retrieve your Riak objects using the objects’ values. The Riak Client API can then be used to perform Search queries that return a list of bucket/key pairs matching the query. Alternatively, the query results can be used as the input to a Riak MapReduce operation. Currently the PHP, Python, Ruby, and Erlang APIs support integration with Riak Search.
There is obviously much more to be written about Riak Search, and a simple blurb in this blog post won’t do it justice. It’s for this reason that we are dedicating an entire blog post to it. You can expect to see this next week. In the mean time, you can go download the code and get more usage information on the Riak Search Section.
riak_kv and riak_core
In addition to focusing on previously-unreleased bits like Search, we also strengthened the code and functionality of riak_kv, the layer that handles the key/value operations, and riak_core, the component of Riak that provides all the services necessary to write a modern, well-behaved distributed application.
Firstly, the structure of the Riak source repos has been changed such that there is a single Erlang app per repo. This permits third parties to use riak_core (the abstracted Dynamo logic) in other applications. As you can imagine, this opens up a world of possibilities and we are looking forward to seeing where this code ends up being implemented. (Check out this blog post for a primer on Riak Core.)
The consistent hash ring balancing logic has been improved to enable large clusters to converge more quickly on a common ring state. Related to this, further work has been done to improve Riak’s performance and robustness when dealing with large data sets and failure scenarios.
Also, the Riak Command Line Tools were expanded with two new commands: “ringready” and “wait-for-service.” These were put in place to help anyone administering a Riak installation to script detection of cluster convergence on a given ring and wait for components like riak_kv to become available, respectively. You can read up on all your favorite Riak Command Line Tools here.
Enhancing Riak’s MapReduce functionality is something on which we’ve been focusing intensely for the past several releases, and this one was no different.
And, we improved MapReduce job input handling. In short what this means is that you will see lower resource consumption and more stable run times when the input producer exceeds the cluster’s ability to process new MapReduce inputs.
It’s also worth mentioning that there is more MapReduce goodness in the pipeline, both code and non-code. Stay tuned, because it’s only getting better!
Bitcask, the default storage backed for Riak, also got some huge enhancements in the 0.13 release. The most noticeable and significant improvement is that Bitcask now uses 40% less memory per key entry. The net effect is that you are required to have less RAM available in your cluster (as Bitcask stores all key data in memory). And, for those of you working with large data sets, we did some work to make starting up Bitcask significantly faster.
Listing keys, something the community has been asking for us to make more efficient, got a nice speedup when using Bitcask (which should make some of your map/reduce queries a lot snappier). This work is ongoing, too, so look for this process to get even faster as we make additional incremental improvements. Also of note, Bitcask will now reclaim memory of expired key entries.
Riak 0.13 is our best release yet and we are more confident than ever that it’s the best database out there for your production needs (assuming it suits your use case, of course). In addition to the changes we highlighted above, you should also take a moment to check out the complete release notes.
After you’re done with those, drop everything and do the following:
- Download Riak 0.13 on your platform of choice
- Download Riak Search
- Join the Mailing List (if you’re not on it already)
- If you’re new to Riak, take a few minutes and do The Riak Fast Track. You won’t regret it.
As always, let us know if you have any questions and issue, and thanks for being a part of Riak!
September 29, 2010
Basho hackers will be giving quite a few presentations between now and the end of the week. And they all happen to be in Baltimore! Here is a quick rundown (in no particular order) of where we will be, who will be there, and what we will be talking about:
Rusty Klophaus at CUFP
Rusty Klophaus will be at the Commercial Users of Functional Programming (CUFP) Event taking place this weekend in Baltimore, Maryland. His talk is called “Riak Core: Building Distributed Applications Without Shared State” and it should be downright amazing.
From the talk’s description: Both Riak KV (a key-value datastore and map/reduce platform) and Riak Search (a Solr-compatible full-text search and indexing engine) are built around a library called Riak Core that manages the mechanics of running a distributed application in a cluster without requiring a central coordinator or shared state. Using Riak Core, these applications can scale to hundreds of servers, handle enterprise-sized amounts of data, and remain operational in the face of server failure.
All the details on his talk can be found here.
And, in case you haven’t been following your Riak Core developments, check out Building Distributed Systems with Riak Core.
Dave Smith at the Ninth ACM SIGPLAN Erlang Workshop
Dave Smith (a.k.a “Dizzyd”) will be keynoting the ACM SIGPLAN Erlang Workshop taking place on Friday, Sept 30th, also in Baltimore. Dave’s talk is called “Rebar, Bitcask and how chemotherapy made me a better developer.”
Rebar and Bitcask are both pieces of software that Dizzy had a major hand in creating and they have played a huge role in Riak’s adoption (not to mention that Rebar has quickly become an indispensable tool for Erlang developers everywhere). Dave was also fighting follicular lymphoma while writing a lot of this code. Needless to say, this one is sure to be memorable and of immense value.
More details on his talk can be found here.
It should also be noted that newly-minted Basho Developer Scott Fritchie is the Workshop Chair for this event. He is an accomplished Erlang developer and Riak is not the only distributed key/value store about which Scott is passionate – he will also happily talk your ear off about Hibari.
Justin Sheehy at Surge
Just when you thought they couldn’t fit another conference in Baltimore on the same weekend… And this one is big: it’s the Surge Conference put on by the team at OmniTI. Basho will be there in the form of CTO Justin Sheehy.
Justin will be giving a talk about concurrency at scale, something about which every distributed systems developer should care deeply. Additionally, he will be taking part in a panel discussion – I haven’t seen an official name for it yet but rumor has it that it’s something along the lines of “SQL versus NoSQL.”
Check out the Surge site for more conference details.
As you can see, Baltimore is the place to be this weekend. Get there at all costs. And then go download Riak.
September 16, 2010
Justin Sheehy, Basho’s CTO, and Bryan Cantrill, Joyent’s VP of Engineering, are two acknowledged authorities on distributed systems and cloud computing. Riak SmartMachines offer you all the advantages of Riak running on the best cloud technology there is.
What do you get when you put Justin Sheehy, Bryan Cantrill and a microphone in a room for 45 minutes to talk about NoSQL and Riak SmartMachine performance? One of the best webinars we’ve ever had the chance to take part in!
Watch this one at least three times. Seriously. And then go download Riak.
You can download a PDF version of the slides used in the webinar here.
September 9, 2010
At long last we have all the details ironed out for the upcoming September Riak Meetup in San Francisco. The crew here in SF is quite excited about this month’s event, and here’s why:
Date: Thursday, Sept. 23rd
Location: Engine Yard Offices, located at 500 Third Street, Suite 510
- 7:15 – Riak Basics
After the first meetup, one of the attendees remarked, “Good, but looking for some basics and some hands on demo as well.” Admittedly, this is something we could have addressed a bit better. So at the beginning of this meetup (as well as all meetups moving forward) we are going to devote at least 15 minutes to discuss Riak basics. There are no stupid questions. Ask away.
- 7:30 – Riak vs Git: NOSQL Battle Royale
Presenter: Rick Olson, Github
This talk will compare and contrast Riak and Git on their merits as key/value stores, and look at how the two can work together.
- 8:00 – From Riak to RabbitMQ
Presenter: Andy Gross, Basho Technologies
This will cover using Riak to publish to RabbitMQ using post-commit hooks and gen_bunny.
- 8:30 – General Riak/Distributed Systems Conversation and Networking
Note: There is only seating for 50, so you’ll want to get there on time to secure a seat.
Basho will be providing food (pizza) and refreshments (beer, soda, etc.). And for those of you who can’t join us next Thursday, I will also be filming the talks with the goal of posting them online if everything goes to plan.
You can RSVP on the Riak Meetup Page. So go do it. Now!
Hope to see you there.
September 3, 2010
Links and link walking are two very powerful concepts in Riak. If used appropriately within an application, they can help to add another level of relationships to your data that are typically not possible when using a key/value store as your primary storage engine. It’s for this reason that we thought new users should know how to use them.
Today we are adding a Links and Link Walking tutorial to the Riak Fast Track. In about 20 minutes, you will learn what links are and how to use them. And, to cap it all off, there is a 12 minute screencast put together by Sean Cribbs to give you a more in depth look at what links are all about.We think this is pretty cool stuff and we have a feeling you’ll enjoy it, too.
As usual, we love, want and need feedback, so send an email to firstname.lastname@example.org with any questions or comments you might have. (I’ve also been known to send t-shirts in exchange for thoughts on how to make Riak and our documentation better…)
Enjoy, and thanks for using Riak!
August 31, 2010
This is a repost from the blog of Sean Cribbs, our Developer Advocate.
It’s been a while since I’ve blogged about a release of Ripple, in fact, it’s been a long time since I’ve released Ripple. So this post is going to dig into Ripple 0.8 (released today, August 31) and catch you up on what has happened since 0.7.1 (and 0.5 if you don’t follow the Github project).
The major features, which I’ll describe in more detail below, are:
- Supports Riak 0.12 features
- Runs on Rails 3 (non-prerelease)
- Adds Linked associations
- Adds session stores for Rack and Rails 3 apps
Riak 0.12 Features
The biggest changes here were some bucket-related features. First of all, you can define default quorum parameters for requests on a per-bucket basis, exposed as bucket properties. Riak 0.12 also allows you to specify “symbolic” quorums, that is, “all” (N replies), “quorum” (N/2 + 1 replies), or “one” (1 reply). Riak::Bucket has support for these new properties and exposes them as attr_accessor-like methods. This is a big time saver if you need to tune your quorums for different use-cases or N-values.
Second, keys are not listed by default. There used to be a big flashing warning sign on Riak::Client#bucket that encouraged you to pass :keys => false. In Ripple 0.8 that’s the default, but it’s also explicit so that if you use the latest gem on Riak 0.11 or earlier, you should get the same behavior.
Runs on Rails 3
I’ve been pushing for Rails 3 ever since Ripple was conceived, but now that the actual release of Rails 3 is out, it’s an easier sell. Thanks to all the contributors who helped me keep Ripple up-to-date with the latest prereleases.
These are HOT, and were the missing features that held me back from saying “Yes, you should use Ripple in your app.” The underlying concepts take some time to understand (the upcoming link-walking page to the Fast Track will help), but you actually have a lot more freedom than foreign keys. Here’s some examples (with a little detail of how they work):
You’ll notice only one and many in the above examples. From the beginning, I’ve eschewed creating the belongs_to macro because I think it has the wrong semantics for how linked associations work (links are all on the origin side). It’s more like you “point to one of” or “point to many of”. Minor point, but often it’s the language you choose that frames how you think about things.
Outside the Ruby-sphere, web session storage is one of Riak’s most popular use-cases. Both Mochi and Wikia are using it for this. Now, it’s really easy to do the same for your Rails or Sinatra app.
For Sinatra, Padrino and other pure Rack apps, use Riak::SessionStore:
For Rails 3, use Ripple::SessionStore.