April 26, 2012
At Basho we love Yammer. Besides making a product that we rely on internally, they are long-time Riak fans and advocates, and have built a large Riak cluster to power notifications for their entire user base. But not every use case is a fit for Riak. Running multiple databases in production is not uncommon, and skilled engineering teams like Yammer’s will always select the best tool for the job.
To that end, Ryan Kennedy, Yammer’s Director of Core Services, presented at BashoChats 003 about some of the impressive work that he and his colleagues are doing with Berkeley DB. He goes in depth on how they came to select BDB, what they added on top of Berkeley to ensure it could scale and satisfy their availability requirements, and what their data set and request profile look like in production. There’s a lot of worthwhile and valuable information in here. (Ryan’s slides are here if you’re interested in the PDF.
Enjoy, and if you’re interested in speaking at a future BashoChats meetup, email me – firstname.lastname@example.org. Also, if you want to work with companies like Yammer, Twitter, Square, Simple, LinkedIn, and Basho building distributed systems, you should be at the next meetup. Keep an eye on the Meetup page for details.
April 27, 2010
because you needed another local key/value store
One aspect of Riak that has helped development to move so quickly is pluggable per-node storage. By allowing nearly anything k/v-shaped to be used for actual persistence, progress on storage engines can occur in parallel with progress on the higher-level parts of the system.
Many such local key/value stores already exist, such as Berkeley DB, Tokyo Cabinet, and Innostore.
There are many goals we sought when evaluating which storage engines to use in Riak, including:
- low latency per item read or written
- high throughput, especially when writing an incoming stream of random items
- ability to handle datasets much larger than RAM w/o degradation
- crash friendliness, both in terms of fast recovery and not losing data
- ease of backup and restore
- a relatively simple, understandable (and thus supportable) code
structure and data format
- predictable behavior under heavy access load or large volume
- a license that allowed for easy default use in Riak
Achieving some of these is easy. Achieving them all is less so.
None of the local key/value storage systems available (including but not limited to those written by us) were ideal with regard to all of the above goals. We were discussing this issue with Eric Brewer when he had a key insight about hash table log merging: that doing so could potentially be made as fast or faster than LSM-trees.
This led us to explore some of the techniques used in the log-structured file systems first developed in the 1980s and 1990s in a new light. That exploration led to the development of bitcask, a storage system that meets all of the above goals very well. While bitcask was originally developed with a goal of being used under Riak, it was also built to be generic and can serve as a local key/value store for other applications as well.
If you would like to read a bit about how it works, we’ve produced a short note describing bitcask’s design that should give you a taste. Very soon you should be able to expect a Riak backend for bitcask, some improvements around startup speed, information on tuning the timing of merge and fsync operations, detailed performance analysis, and more.
In the meantime, please feel free to give it a try!
- Justin and Dizzy