Tag Archives: LevelDB

Leveraging Riak with Cloud Foundry

November 26, 2013

From a developer’s perspective, interacting with a Platform as a Service (PaaS) allows you to defer most of the responsibilities associated with deploying an application (network, servers, storage, etc.). That leaves two primary responsibilities: write the code and push it up to the platform (along with some configuration settings).

Often the platform will provide you with an arsenal of data stores to persist data produced by your applications. Riak and Riak CS are attractive data storage solutions for PaaS deployments because, as applications are spun up and down, the databases housing their data needs to remain available. In addition, Riak and Riak CS can scale along with your applications so that reading and writing data isn’t a bottleneck.

(If you’re interested in the differences between Riak and Riak CS, look no further.)

Cloud Foundry

One of the most popular open source Platforms as a Service is Cloud Foundry. So, how would you leverage data stores like Riak or Riak CS with it?

Service Broker

At a high level, a Cloud Foundry service broker advertises a catalog of services and service plans. For Riak, the service is Riak itself and the service plans are either a Bitcask or a LevelDB bucket. Cloud Foundry will not supply a Riak cluster for you: this must be something you’ve deployed and configured per the Wiring It Up section below.

You can think of the service broker as a middleman between your cluster and Cloud Foundry. The broker implements a set of APIs referred to as the Services API. As you interact with Cloud Foundry services, its Cloud Controller asks your broker to create, delete, bind, and unbind instances of your service. In turn, your broker talks to an appropriately configured Riak cluster and fulfills those requests (for example, by instructing the Riak cluster to create a new bucket with a Bitcask backend).

There are currently two versions of the Services API: v1 and v2. However, since v2 is the recommended broker by Cloud Foundry, we decided to use the v2 Services API to build a Riak service broker for Cloud Foundry.

Wiring It Up

The first step in connecting Riak and Cloud Foundry begins with having a specially configured Riak cluster. The “specially” part just means that it needs to be configured with the multi-backend.

The multi-backend allows Riak to be configured with more than one backend (for example, Bitcast and LevelDB) at the same time on a single cluster:

Next, connect Riak and the service broker by editing the service broker’s configuration file and launching it:

Note: riak_hosts can contain one or more Riak nodes. It is generally recommended to front your Riak cluster with some form of load balancing technology. If you have your cluster behind a load balancer, simply add only one host to riak_hosts.

From there, you need to register it with your Cloud Foundry instance:

Note: xip.io is a service that provides wildcard DNS for any (public or private) IP address.

After that, just push your application to Cloud Foundry and create an instance of the service using the interactive prompt:

If we push up a sample application that just dumps the VCAP_SERVICES environment variable as JSON, it looks like this:

Conclusion

The Riak service broker for Cloud Foundry is far from finished. The original use case was built to deal with application testing.

For example, the broker can allocate a Bitcask backed bucket to an application so that it runs its tests against a Bitcask backend. Afterwards, the broker can tear down that bucket and provide another bucket for testing backed by LevelDB.

Clearly, there are more use cases than this.

We encourage you to contribute to the project in ways that help steer it toward your use cases by opening issues and pull requests. The end goal is to pair this service broker with a full Riak BOSH release.

(If you’re interested in a Riak BOSH release now, the community has put together a great starting point.)

To sum it up, Riak is a powerful storage platform for building applications that need to scale and cannot go down. Cloud Foundry is a tool that empowers developers by giving them an environment where they can simply push up code and have it transform into a running application.

Marrying the two is quite powerful and provides organizations with both a flexible application and data tier.

Helpful Links

Hector Castro and John Daily

Top Five Questions About Riak

April 17, 2013

This post looks at five commonly asked questions about Riak. For more questions and answers, check out our Riak FAQ.

What hardware should I use with Riak?

Riak is designed to be run on commodity hardware and is run in production on a variety of different server types on both private and public infrastructure. However, there are several key considerations when choosing the right infrastructure for your Riak deployment.

RAM is one of the most important factors – RAM availability directly affects what Riak backend you should use (see question below), and is also required for complex MapReduce queries. In terms of disk space, Riak automatically replicates data according to a configurable n_val. A bucket-level property that defaults to 3, n_val determines how many copies of each object will be stored, and provides the inherent redundancy underlying Riak’s fault-tolerance and high availability. Your hardware choice should take into consideration how many objects you plan to store and the replication factor, however, Riak is designed for horizontal scale and lets you easily add capacity by joining additional nodes to your cluster. Additional factors that might affect choice of hardware include IO capacity, especially for heavy write loads, and intra-cluster bandwidth. For additional factors in capacity planning, check out our documentation on cluster capacity planning.

Riak is explicitly supported on several cloud infrastructure providers. Basho provides free Riak AMIs for use on AWS. We recommend using large, extra large, and cluster compute instance types on Amazon EC2 for optimal performance. Learn more in our documentation on performance tuning for AWS. Engine Yard provides hosted Riak solutions, and we also offer virtual machine images for the Microsoft VM Depot.

What backend is best for my application?

Riak offers several different storage backends to support use cases with different operational profiles. Bitcask and LevelDB are the most commonly used backends.

Bitcask was developed in-house at Basho to offer extremely fast read/write performance and high throughput. Bitcask is the default storage engine for Riak and ships with it. Bitcask uses an in-memory hash-table of all keys you write to Riak, which points directly to the on-disk location of the value. The direct lookup from memory means Bitcask never uses more than one disk seek to read data. Writes are also very fast with Bitcask’s write-once, append-only design. Bitcask also offers benefits like easier backups and fast crash recovery. The inherent limitation is that your system must have enough memory to contain your entire keyspace, with room for a few other operational components. However, unless you have an extremely large number of keys, Bitcask fits many datasets. Visit our documentation for more details on Bitcask, and use the Bitcask Capacity Calculator to assist you with sizing your cluster.

LevelDB is an open-source, on-disk key-value store from Google. Basho maintains a version of LevelDB tuned specifically for Riak. LevelDB doesn’t have Bitcask’s memory constraints around keyspace size, and thus is ideal for deployments with a very large number of keys. In addition to this advantage, LevelDB uses Google Snappy data compression, which provides particular efficiency for text data like raw text, Base64, JSON, HTML, etc. To use LevelDB with Riak, you must the change the storage backend variable in the app.config file. You can find more details on LevelDB here.

Riak also offers a Memory storage backend that does not persist data and is used simply for testing or small amounts of transient state. You can also run multiple backends within a single Riak instance, which is useful if you want to use different backends for different Riak buckets or use a different storage configuration for some buckets. For in-depth information on Riak’s storage backends, see our documentation on choosing a backend.

How do I model data using Riak’s key/value design?

Riak uses a key/value design to store data. Key/value pairs comprise objects, which are stored in buckets. Buckets are flat namespaces with some configurable properties, such as the replication factor. One frequent question we get is how to build applications using the key/value scheme. The unique needs of your application should be taken into account when structuring it, but here are some common approaches to typical use cases. Note that Riak is content-agnostic, so values can be any content type.

Data Type Key Value
Session User/Session ID Session Data
Content Title, Integer Document, Image, Post, Video, Text, JSON/HTML, etc.
Advertising Campaign ID Ad Content
Logs Date Log File
Sensor Date, Date/Time Sensor Updates
User Data Login, Email, UUID User Attributes

For more comprehensive information on building applications with Riak’s key/value design, view the use cases section of our documentation.

What other options, besides strict key/value access, are there for querying Riak?

Most operations done with Riak will be reading and writing key/value pairs to Riak. However, Riak exposes several other features for searching and accessing data: MapReduce, full-text search, and secondary indexing.

MapReduce provides non-primary key based querying that divides work across the Riak distributed database. It is useful for tasks such as filtering by tags, counting words, extracting links, analyzing log files, and aggregation tasks. Riak provides both Javascript and Erlang MapReduce support. Jobs written in Erlang are generally more performant. You can find more details about Riak MapReduce here.

Riak also provides Riak Search, a full-text search engine that indexes documents on write and provides an easy, robust query language and SOLR-like API. Riak Search is ideal for indexing content like posts, user bios, articles, and other documents, as well as indexing JSON data. For more information, see the documentation on Riak Search.

Secondary indexing allows you to tag objects in Riak with one or more queryable values. These “tags” can then be queried by exact or range value for integers and strings. Secondary indexing is great for simple tagging and searching Riak objects for additional attributes. Check out more details here.

How does Riak differ from other databases?

We often get asked how Riak is different from other databases and other technologies. While an in-depth analysis is outside the scope of this post, the below should point you in the right direction.

Riak is often used by applications and companies with a primary background in relational databases, such as MySQL. Most people who move from a relational database to Riak cite a few reasons. For one, Riak’s masterless, fault-tolerant, read/write available design make it a better fit for data that must be highly available and resilient to failure scenarios. Second, Riak’s operational profile and use of consistent hashing means data is automatically redistributed as you add machines, avoiding hot spots in the database and manual resharding efforts. Riak is also chosen over relational databases for the multi-datacenter capabilities provided in Riak Enterprise. A more detailed look at the difference between Riak and traditional databases and how to make the switch can be found in this whitepaper, From Relational to Riak.

A more detailed look at the technical differences between Riak and other NoSQL databases can be found in the comparisons section of our documentation, which covers databases such as MongoDB, Couchbase, Neo4j, Cassandra, and others.

Ready to get started? You can download Riak here. For more in-depth information about Riak, we also offer Riak Workshops in New York and San Francisco. Learn more here.

Basho

LevelDB in Riak 1.2

October 30, 2012

Google’s LevelDB has proven very versatile within Riak — LevelDB is implemented in Riak as eleveldb, an Erlang wrapper around levelDB. But Google’s target environment was much different than the large data environment of Riak’s users. Riak 1.2 contains the first wave of performance tuning for large data. These changes improve overall throughput and eliminate most instances where levelDB would hang for a few seconds trying to catch up. The new release also contains a fix for an infinite loop compaction condition, a bloom filter that greatly reduces time searching for non-existent keys, and several bug fixes. This blog details these improvements and also gives some internal benchmark results obtained using basho_bench, Basho’s open source benchmarking tool.

What’s New?

  • Stalls: In Riak 1.1, individual vnodes in Riak (one levelDB database) could have long pauses before responding to individual get / put calls. Several stall sources were identified and corrected. On a test server, LevelDB in 1.1 saw stalls of 10 to 90 seconds every 3 to 5 minutes. In Riak 1.2, levelDB sometimes sees one stall every 2 hours for 10 to 30 seconds.
  • Throughput: While impacted by stalls, throughput is an independent code and tuning issue. The fundamental change made was to increase all on-disk file sizes to minimize the number of file opens and reduce the number of compactions. LevelDB in Riak 1.1 had a throughput of ~400 operations per second on a given server. These changes raised throughput to ~2,000 operations per second.
  • Infinite loop during compaction: In 1.1, the background compaction would get caught in an infinite loop if it encountered a file with a corrupt data block. The previous solution was to stop the node, manually execute “recovery”, then restart the node. The entire file (and all its data) was removed from the data store and copied to the “lost” directory during the recovery. Riak 1.2 creates one file, BLOCKS.bad, in the “lost” directory. The levelDB code then automatically removes the corrupted block from compaction processing and copies it to this file. The compaction then continues processing the remaining data in the file (and moves along without going into an infinite loop).
  • Merge of levelDB bloom filter code: Google has created a bloom filter to help levelDB more quickly identify keys that do not exist in the data store. The bloom filter code is merged into this release. While incrementally beneficial in its own right, the bloom filter enables changes to the file / level structure which dramatically improves overall throughput.
  • app.config eleveldb options: in Riak 1.1, most parameters set in app.config for the levelDB layer were never passed. This is corrected. Users should assume previous parameter tests / experiments to be invalid.

Benchmarks

The graphs below illustrate levelDB’s improvements in throughput and maximum latency. Test data was obtained using basho_bench, Basho’s open source benchmarking tool. Raw data and configuration files can be downloaded here. In the benchmark presented, levelDB preloads a database with 10 million sequentially ordered keys.

As can be seen, levelDB 1.1 stalls regularly, whereas 1.2 seldom stalls due to stall management improvements. We can also see that levelDB in 1.2 has a higher ingest rate (we were able to input 10 million records in 44 minutes compared to 106 minutes in 1.1)

Throughput in levelDB 1.1
Alt text

Throughput in levelDB 1.2
Alt text

Maximum latency in levelDB 1.1
Alt text

Maximum latency in levelDB 1.2
Alt text

Roadmap

We have already identified further performance tuning for future work, including bloom filter modification and removing redundancy (bloat) during memory to level-0 file creation. Expect another wave of performance tuning in subsequent point releases and major releases.

  • Data backup: Theoretically there is no need to perform data backup on levelDB since Riak duplicates all data across several nodes. But many users have suggested they would still sleep better if there was a means to perform a direct backup by node/vnode anyway. Backups during live operation are a planned, next feature.
  • Infinite loops: Riak 1.2 contains fixes for a couple of the most common cases where compactions could enter infinite loops when the state of files on the disk does not match that of LevelDB’s internal state. However, there are still other, less frequent cases that can still cause infinite loops. These less frequent cases are high on the future work list.
  • Error correction: LevelDB has methods to repair and restore damaged vnodes. The time cost of executing a repair can be huge. The repair time is already better with release 1.2 (in one case the time was reduced from 6 weeks … really … to eleven hours). We already have a design waiting for programming resources that will further speed repair time.

Seth Benton and Matthew Von-Maszewski

Supporting Riak on *BSD

June 11, 2012

Q: What platforms are supported by Riak?

This question comes up quite frequently and there has always been two answers to that question. The first answer really answers the question of “what platforms can Riak run on?” The second one answers the question of “what platforms are packaged for and tested by Basho?”

*BSD and the Answer to the First Question

Riak has been built and run on FreeBSD by a handful of users for quite some time from what we can tell. Those dedicated users had to jump through hoops involving modifying many of Riak’s Makefile‘s as well as #ifdef‘s scattered throughout our storage backends. From a great-user-experience point of view, this is not ideal.

Early last year work was started by both the community and Basho to improve our codebase to support *BSD properly. Unfortunately with our limited resources at the time, finishing the testing on *BSD got set aside in favor of new Riak features. This left a single missing piece from easy use of Riak on *BSD, LevelDB.

In December Basho engineer Andrew Thompson submitted a patch to Google to add DragonFlyBSD/NetBSD/OpenBSD support to LevelDB so we can use those fixes in eLevelDB. In March 2012, the patch was merged into LevelDB. That patch, along with the eventual merge of the “BSD Support” pull-request, finally added BSD to the answer of the first question, “what platforms can Riak run on?”

While great, this still leaves Riak *BSD users alone when it comes to the second question “what platforms are packaged for and tested by Basho?”

*BSD and the Answer to the Second Question

Whenever asked what platforms are officially supported by Basho the answer has always been to look at what packages were available for Riak. After the release of Riak 1.1, we decided we had the capacity to support another major platform with our next major release. Due to interest from the community and customers, FreeBSD was chosen. Now “Free” doesn’t satisfy all * in *BSD, but it is a good place to start!

Luckily the work to actually package FreeBSD was made easier by all the work mentioned above to have it build cleanly. As of this commit, packages for FreeBSD have been building with every commit on riak/master. So, stay tuned for the next major Riak release, when we’ll ship FreeBSD 9 packages as well as update the installation documentation. Now I’m on the hook for writing documentation I suppose!

Until then if you want to play around with the FreeBSD packages and send me feedback, clone Riak and

$ gmake package RELEASE=1

from your FreeBSD machine. Then you should be able to pkg_add and pkg_delete Riak.

Thanks,

Jared

Leveling the Field

July 1, 2011

For most Riak users, Bitcask is the obvious right storage engine to use. It provides low latency, solid predictability, is robust in the face of crashes, and is friendly from a filesystem backup point of view. However, it has one notable limitation: total RAM use depends linearly (though via a small constant) on the total number of objects stored. For this reason, Riak users that need to store billions of entries per machine sometimes use Innostore, (our wrapper around embedded InnoDB) as their storage engine instead. InnoDB is a robust and well-known storage engine, and uses a more traditional design than Bitcask which allows it to tolerate a higher maximum number of items stored on a given host.

However, there are a number of reasons that people may wish for something other than Innostore when they find that they are in this situation. It is less comfortable to back up than bitcask, imposes a higher minimum overhead on disk space, only performs well when both heavily tuned (and given multiple spindles), and comes with a more restrictive license. For all of these reasons we have been paying close attention to LevelDB, which was recently released by Google. LevelDB’s storage architecture is more like BigTable’s memtable/sstable model than it is like either Bitcask or InnoDB. This design and implementation brings the possibility of a storage engine without Bitcask’s RAM limitation and also without any of the above drawbacks of InnoDB. Our early hypothesis after reading the text and code was that LevelDB might fill an InnoDB-like role for Riak users, without some of the downsides. As some of the early bugs in LevelDB were fixed and stability improved, our hopes rose further.

In order to begin testing this possibility, we have begun to perform some simple performance comparisons between LevelDB and InnoDB using basho_bench and a few different usage patterns. All of these comparisons were performed on the exact same machine, a fairly basic 2-CPU Linux server with 4G of RAM, mid-range SATA disks, and so on — a fairly typical commodity system. Note that this set of tests are not intended to provide useful absolute numbers for either database, but rather to allow some preliminary comparisons between the two. We tried to be as fair as possible. For instance, InnoDB was given an independent disk for its journaling.

The first comparison was a sequential load into an empty database. We inserted one hundred million items with numerically-sorted keys, using fairly small values of 100 bytes per item.

The database created by this insert test was used as the starting point for all subsequent tests. Each subsequent test was run in steady-state for one hour on each of the two databases. Longer runs will be important for us to gauge stability, but an hour per test seemed like a useful starting point.

For the second comparison, we did a read-only scenario with a pareto distribution. This means that a minority of the items in the database would see the vast majority of requests, which means that there will be relatively high churn but also a higher percentage of cache hits in a typical system.

The third comparison used exactly the same pareto pattern for key distribution, but instead of being pure reads it was a 90/10 read/write ratio.

The fourth comparison was intended to see how the two systems compared to each other in a very-high-churn setting. It used the same dataset, but write-only, and in an extremely narrow pareto distribution such that nearly all writes would be within a narrow set of keys, causing a relatively small number of items to be overwritten many times.

In each of these tests, LevelDB showed a higher throughput than InnoDB and a similar or lower latency than InnoDB. Our goal in this initial round was to explore basic feasibility, and that has now been established.

This exercise has not been an attempt to provide comprehensive general-purpose benchmarks for either of these two storage engines. A number of choices made do not represent any particular generic usage pattern but were instead made to quickly put the systems under stress and to minimize the number of variables being considered. There are certainly many scenarios where either of these two storage systems can certainly be made to perform differently (sometimes much better) than they did here. In some earlier tests, we saw InnoDB provide a narrower variance of latency (such as lower values in the 99th percentile) but we have not seen that reproduced in this set of tests. Among the other things not done in this quick set of tests: using the storage engines through Riak, deeply examining their I/O behavior, observing their stability over very long periods of time, comparing their response to different concurrency patterns, or comparing them to a wider range of embedded storage engines. All of these directions (and more) are good ideas for continued work in the future, and we will certainly do some of that.

Despite everything we haven’t yet done, this early work has validated one early hope and hypothesis. It appears that LevelDB may become a preferred choice for Riak users whose data set has massive numbers of keys and therefore is a poor match with Bitcask’s model. Performance aside, it compares favorably to InnoDB on other issues such as permissive license and operational usability. We are now going ahead with the work and continued testing needed to keep exploring this hypothesis and to improve both Riak and LevelDB in order to make their combined use an option for our customers and open source community.

Some issues still remain that we are working to resolve before LevelDB can be a first-class storage engine under Riak. One such issue that we are working on (with the LevelDB maintainers at Google) is making the LevelDB code portable to all of the same platforms that Riak is supported on. We are confident that these issues will be resolved in the very near future. Accordingly, we are moving ahead with a backend enabling Riak’s use of LevelDB.

Justin