July 1, 2011
For most Riak users, Bitcask is the obvious right storage engine to use. It provides low latency, solid predictability, is robust in the face of crashes, and is friendly from a filesystem backup point of view. However, it has one notable limitation: total RAM use depends linearly (though via a small constant) on the total number of objects stored. For this reason, Riak users that need to store billions of entries per machine sometimes use Innostore, (our wrapper around embedded InnoDB) as their storage engine instead. InnoDB is a robust and well-known storage engine, and uses a more traditional design than Bitcask which allows it to tolerate a higher maximum number of items stored on a given host.
However, there are a number of reasons that people may wish for something other than Innostore when they find that they are in this situation. It is less comfortable to back up than bitcask, imposes a higher minimum overhead on disk space, only performs well when both heavily tuned (and given multiple spindles), and comes with a more restrictive license. For all of these reasons we have been paying close attention to LevelDB, which was recently released by Google. LevelDB’s storage architecture is more like BigTable’s memtable/sstable model than it is like either Bitcask or InnoDB. This design and implementation brings the possibility of a storage engine without Bitcask’s RAM limitation and also without any of the above drawbacks of InnoDB. Our early hypothesis after reading the text and code was that LevelDB might fill an InnoDB-like role for Riak users, without some of the downsides. As some of the early bugs in LevelDB were fixed and stability improved, our hopes rose further.
In order to begin testing this possibility, we have begun to perform some simple performance comparisons between LevelDB and InnoDB using basho_bench and a few different usage patterns. All of these comparisons were performed on the exact same machine, a fairly basic 2-CPU Linux server with 4G of RAM, mid-range SATA disks, and so on — a fairly typical commodity system. Note that this set of tests are not intended to provide useful absolute numbers for either database, but rather to allow some preliminary comparisons between the two. We tried to be as fair as possible. For instance, InnoDB was given an independent disk for its journaling.
The first comparison was a sequential load into an empty database. We inserted one hundred million items with numerically-sorted keys, using fairly small values of 100 bytes per item.
The database created by this insert test was used as the starting point for all subsequent tests. Each subsequent test was run in steady-state for one hour on each of the two databases. Longer runs will be important for us to gauge stability, but an hour per test seemed like a useful starting point.
For the second comparison, we did a read-only scenario with a pareto distribution. This means that a minority of the items in the database would see the vast majority of requests, which means that there will be relatively high churn but also a higher percentage of cache hits in a typical system.
The third comparison used exactly the same pareto pattern for key distribution, but instead of being pure reads it was a 90/10 read/write ratio.
The fourth comparison was intended to see how the two systems compared to each other in a very-high-churn setting. It used the same dataset, but write-only, and in an extremely narrow pareto distribution such that nearly all writes would be within a narrow set of keys, causing a relatively small number of items to be overwritten many times.
In each of these tests, LevelDB showed a higher throughput than InnoDB and a similar or lower latency than InnoDB. Our goal in this initial round was to explore basic feasibility, and that has now been established.
This exercise has not been an attempt to provide comprehensive general-purpose benchmarks for either of these two storage engines. A number of choices made do not represent any particular generic usage pattern but were instead made to quickly put the systems under stress and to minimize the number of variables being considered. There are certainly many scenarios where either of these two storage systems can certainly be made to perform differently (sometimes much better) than they did here. In some earlier tests, we saw InnoDB provide a narrower variance of latency (such as lower values in the 99th percentile) but we have not seen that reproduced in this set of tests. Among the other things not done in this quick set of tests: using the storage engines through Riak, deeply examining their I/O behavior, observing their stability over very long periods of time, comparing their response to different concurrency patterns, or comparing them to a wider range of embedded storage engines. All of these directions (and more) are good ideas for continued work in the future, and we will certainly do some of that.
Despite everything we haven’t yet done, this early work has validated one early hope and hypothesis. It appears that LevelDB may become a preferred choice for Riak users whose data set has massive numbers of keys and therefore is a poor match with Bitcask’s model. Performance aside, it compares favorably to InnoDB on other issues such as permissive license and operational usability. We are now going ahead with the work and continued testing needed to keep exploring this hypothesis and to improve both Riak and LevelDB in order to make their combined use an option for our customers and open source community.
Some issues still remain that we are working to resolve before LevelDB can be a first-class storage engine under Riak. One such issue that we are working on (with the LevelDB maintainers at Google) is making the LevelDB code portable to all of the same platforms that Riak is supported on. We are confident that these issues will be resolved in the very near future. Accordingly, we are moving ahead with a backend enabling Riak’s use of LevelDB.
November 15, 2010
Oracle didn’t (and can’t) take away your open source software.
A few weeks ago Oracle caused a lot of confusion when they changed the makeup of the MySQL product line, including a “MySQL Classic Edition” version that does not cost money and does not include InnoDB. That combination in the product chart made many people wonder if InnoDB itself had ceased to be free in either the “free beer” or “free speech” sense. The people wondering and worrying included a few users of Innostore, the InnoDB-based storage engine that can be used with Riak.
Luckily, open source software doesn’t work that way.
Oracle didn’t really even try to do what some people thought; they just released a confusing product graph which they have since updated. The MySQL that most people think of first is MySQL Community Edition and it was not one of the editions mentioned in the chart that confused people. That version of MySQL, as well as all of the GPL components included in it such as InnoDB, remain free of cost and also available under the GPL.
This confusion eventually led to a public response from Oracle, so you can read it authoritatively if you like.
Even if someone wanted to, they couldn’t “take it back” in the way that some people feared. Existing software that has been legitimately available under an open source license such as GPL or Apache cannot retroactively be made unfree. The copyright owner might choose to not license future improvements as open source, but that which is already released in such a way cannot be undone. Oracle and Innobase aren’t currently putting new effort into Embedded InnoDB, but a new project has spun up to move it forward. If the HailDB project produces improvements of value, then future versions of Innostore may switch to using that engine instead of using the original Embedded InnoDB release.
February 2, 2010
You may remember that Basho recently open-sourced Innostore, our standalone Erlang application that provides a simple interface to embedded InnoDB…
In this podcast, Dave “Dizzy” Smith and Justin Sheehy discuss the release of Innostore, why we built it, how we use it in Riak, and why it might be useful for other Erlang projects. The discussion focuses on the stability and predictability of InnoDB, especially under load and as compared with other storage backends like DETS.
And of course, go download Innostore when you are done with the podcast.
January 26, 2010
Riak has pluggable storage engines, and so we’re always on the lookout for better ways that users can store their data locally. Recent experiences with some Basho customers managing some large datasets led us to believe that InnoDB might work out very well for them.
To answer that question and fill that need, Innostore was written. It is a standalone Erlang application that provides a simple interface to Embedded InnoDB. So far its performance has been quite good, though InnoDB (with or without the Innostore API) is highly dependent on tuning the local configuration to match the local hardware. Luckily, Dizzy — the author of Innostore — has some heavy-duty experience doing that kind of tuning and as a result we’ve been able to help people meet their performance goals using Innostore.