Tag Archives: Bitcask

Leveraging Riak with Cloud Foundry

November 26, 2013

From a developer’s perspective, interacting with a Platform as a Service (PaaS) allows you to defer most of the responsibilities associated with deploying an application (network, servers, storage, etc.). That leaves two primary responsibilities: write the code and push it up to the platform (along with some configuration settings).

Often the platform will provide you with an arsenal of data stores to persist data produced by your applications. Riak and Riak CS are attractive data storage solutions for PaaS deployments because, as applications are spun up and down, the databases housing their data needs to remain available. In addition, Riak and Riak CS can scale along with your applications so that reading and writing data isn’t a bottleneck.

(If you’re interested in the differences between Riak and Riak CS, look no further.)

Cloud Foundry

One of the most popular open source Platforms as a Service is Cloud Foundry. So, how would you leverage data stores like Riak or Riak CS with it?

Service Broker

At a high level, a Cloud Foundry service broker advertises a catalog of services and service plans. For Riak, the service is Riak itself and the service plans are either a Bitcask or a LevelDB bucket. Cloud Foundry will not supply a Riak cluster for you: this must be something you’ve deployed and configured per the Wiring It Up section below.

You can think of the service broker as a middleman between your cluster and Cloud Foundry. The broker implements a set of APIs referred to as the Services API. As you interact with Cloud Foundry services, its Cloud Controller asks your broker to create, delete, bind, and unbind instances of your service. In turn, your broker talks to an appropriately configured Riak cluster and fulfills those requests (for example, by instructing the Riak cluster to create a new bucket with a Bitcask backend).

There are currently two versions of the Services API: v1 and v2. However, since v2 is the recommended broker by Cloud Foundry, we decided to use the v2 Services API to build a Riak service broker for Cloud Foundry.

Wiring It Up

The first step in connecting Riak and Cloud Foundry begins with having a specially configured Riak cluster. The “specially” part just means that it needs to be configured with the multi-backend.

The multi-backend allows Riak to be configured with more than one backend (for example, Bitcast and LevelDB) at the same time on a single cluster:

Next, connect Riak and the service broker by editing the service broker’s configuration file and launching it:

Note: riak_hosts can contain one or more Riak nodes. It is generally recommended to front your Riak cluster with some form of load balancing technology. If you have your cluster behind a load balancer, simply add only one host to riak_hosts.

From there, you need to register it with your Cloud Foundry instance:

Note: xip.io is a service that provides wildcard DNS for any (public or private) IP address.

After that, just push your application to Cloud Foundry and create an instance of the service using the interactive prompt:

If we push up a sample application that just dumps the VCAP_SERVICES environment variable as JSON, it looks like this:

Conclusion

The Riak service broker for Cloud Foundry is far from finished. The original use case was built to deal with application testing.

For example, the broker can allocate a Bitcask backed bucket to an application so that it runs its tests against a Bitcask backend. Afterwards, the broker can tear down that bucket and provide another bucket for testing backed by LevelDB.

Clearly, there are more use cases than this.

We encourage you to contribute to the project in ways that help steer it toward your use cases by opening issues and pull requests. The end goal is to pair this service broker with a full Riak BOSH release.

(If you’re interested in a Riak BOSH release now, the community has put together a great starting point.)

To sum it up, Riak is a powerful storage platform for building applications that need to scale and cannot go down. Cloud Foundry is a tool that empowers developers by giving them an environment where they can simply push up code and have it transform into a running application.

Marrying the two is quite powerful and provides organizations with both a flexible application and data tier.

Helpful Links

Hector Castro and John Daily

Top Five Questions About Riak

April 17, 2013

This post looks at five commonly asked questions about Riak. For more questions and answers, check out our Riak FAQ.

What hardware should I use with Riak?

Riak is designed to be run on commodity hardware and is run in production on a variety of different server types on both private and public infrastructure. However, there are several key considerations when choosing the right infrastructure for your Riak deployment.

RAM is one of the most important factors – RAM availability directly affects what Riak backend you should use (see question below), and is also required for complex MapReduce queries. In terms of disk space, Riak automatically replicates data according to a configurable n_val. A bucket-level property that defaults to 3, n_val determines how many copies of each object will be stored, and provides the inherent redundancy underlying Riak’s fault-tolerance and high availability. Your hardware choice should take into consideration how many objects you plan to store and the replication factor, however, Riak is designed for horizontal scale and lets you easily add capacity by joining additional nodes to your cluster. Additional factors that might affect choice of hardware include IO capacity, especially for heavy write loads, and intra-cluster bandwidth. For additional factors in capacity planning, check out our documentation on cluster capacity planning.

Riak is explicitly supported on several cloud infrastructure providers. Basho provides free Riak AMIs for use on AWS. We recommend using large, extra large, and cluster compute instance types on Amazon EC2 for optimal performance. Learn more in our documentation on performance tuning for AWS. Engine Yard provides hosted Riak solutions, and we also offer virtual machine images for the Microsoft VM Depot.

What backend is best for my application?

Riak offers several different storage backends to support use cases with different operational profiles. Bitcask and LevelDB are the most commonly used backends.

Bitcask was developed in-house at Basho to offer extremely fast read/write performance and high throughput. Bitcask is the default storage engine for Riak and ships with it. Bitcask uses an in-memory hash-table of all keys you write to Riak, which points directly to the on-disk location of the value. The direct lookup from memory means Bitcask never uses more than one disk seek to read data. Writes are also very fast with Bitcask’s write-once, append-only design. Bitcask also offers benefits like easier backups and fast crash recovery. The inherent limitation is that your system must have enough memory to contain your entire keyspace, with room for a few other operational components. However, unless you have an extremely large number of keys, Bitcask fits many datasets. Visit our documentation for more details on Bitcask, and use the Bitcask Capacity Calculator to assist you with sizing your cluster.

LevelDB is an open-source, on-disk key-value store from Google. Basho maintains a version of LevelDB tuned specifically for Riak. LevelDB doesn’t have Bitcask’s memory constraints around keyspace size, and thus is ideal for deployments with a very large number of keys. In addition to this advantage, LevelDB uses Google Snappy data compression, which provides particular efficiency for text data like raw text, Base64, JSON, HTML, etc. To use LevelDB with Riak, you must the change the storage backend variable in the app.config file. You can find more details on LevelDB here.

Riak also offers a Memory storage backend that does not persist data and is used simply for testing or small amounts of transient state. You can also run multiple backends within a single Riak instance, which is useful if you want to use different backends for different Riak buckets or use a different storage configuration for some buckets. For in-depth information on Riak’s storage backends, see our documentation on choosing a backend.

How do I model data using Riak’s key/value design?

Riak uses a key/value design to store data. Key/value pairs comprise objects, which are stored in buckets. Buckets are flat namespaces with some configurable properties, such as the replication factor. One frequent question we get is how to build applications using the key/value scheme. The unique needs of your application should be taken into account when structuring it, but here are some common approaches to typical use cases. Note that Riak is content-agnostic, so values can be any content type.

Data Type Key Value
Session User/Session ID Session Data
Content Title, Integer Document, Image, Post, Video, Text, JSON/HTML, etc.
Advertising Campaign ID Ad Content
Logs Date Log File
Sensor Date, Date/Time Sensor Updates
User Data Login, Email, UUID User Attributes

For more comprehensive information on building applications with Riak’s key/value design, view the use cases section of our documentation.

What other options, besides strict key/value access, are there for querying Riak?

Most operations done with Riak will be reading and writing key/value pairs to Riak. However, Riak exposes several other features for searching and accessing data: MapReduce, full-text search, and secondary indexing.

MapReduce provides non-primary key based querying that divides work across the Riak distributed database. It is useful for tasks such as filtering by tags, counting words, extracting links, analyzing log files, and aggregation tasks. Riak provides both Javascript and Erlang MapReduce support. Jobs written in Erlang are generally more performant. You can find more details about Riak MapReduce here.

Riak also provides Riak Search, a full-text search engine that indexes documents on write and provides an easy, robust query language and SOLR-like API. Riak Search is ideal for indexing content like posts, user bios, articles, and other documents, as well as indexing JSON data. For more information, see the documentation on Riak Search.

Secondary indexing allows you to tag objects in Riak with one or more queryable values. These “tags” can then be queried by exact or range value for integers and strings. Secondary indexing is great for simple tagging and searching Riak objects for additional attributes. Check out more details here.

How does Riak differ from other databases?

We often get asked how Riak is different from other databases and other technologies. While an in-depth analysis is outside the scope of this post, the below should point you in the right direction.

Riak is often used by applications and companies with a primary background in relational databases, such as MySQL. Most people who move from a relational database to Riak cite a few reasons. For one, Riak’s masterless, fault-tolerant, read/write available design make it a better fit for data that must be highly available and resilient to failure scenarios. Second, Riak’s operational profile and use of consistent hashing means data is automatically redistributed as you add machines, avoiding hot spots in the database and manual resharding efforts. Riak is also chosen over relational databases for the multi-datacenter capabilities provided in Riak Enterprise. A more detailed look at the difference between Riak and traditional databases and how to make the switch can be found in this whitepaper, From Relational to Riak.

A more detailed look at the technical differences between Riak and other NoSQL databases can be found in the comparisons section of our documentation, which covers databases such as MongoDB, Couchbase, Neo4j, Cassandra, and others.

Ready to get started? You can download Riak here. For more in-depth information about Riak, we also offer Riak Workshops in New York and San Francisco. Learn more here.

Basho

Leveling the Field

July 1, 2011

For most Riak users, Bitcask is the obvious right storage engine to use. It provides low latency, solid predictability, is robust in the face of crashes, and is friendly from a filesystem backup point of view. However, it has one notable limitation: total RAM use depends linearly (though via a small constant) on the total number of objects stored. For this reason, Riak users that need to store billions of entries per machine sometimes use Innostore, (our wrapper around embedded InnoDB) as their storage engine instead. InnoDB is a robust and well-known storage engine, and uses a more traditional design than Bitcask which allows it to tolerate a higher maximum number of items stored on a given host.

However, there are a number of reasons that people may wish for something other than Innostore when they find that they are in this situation. It is less comfortable to back up than bitcask, imposes a higher minimum overhead on disk space, only performs well when both heavily tuned (and given multiple spindles), and comes with a more restrictive license. For all of these reasons we have been paying close attention to LevelDB, which was recently released by Google. LevelDB’s storage architecture is more like BigTable’s memtable/sstable model than it is like either Bitcask or InnoDB. This design and implementation brings the possibility of a storage engine without Bitcask’s RAM limitation and also without any of the above drawbacks of InnoDB. Our early hypothesis after reading the text and code was that LevelDB might fill an InnoDB-like role for Riak users, without some of the downsides. As some of the early bugs in LevelDB were fixed and stability improved, our hopes rose further.

In order to begin testing this possibility, we have begun to perform some simple performance comparisons between LevelDB and InnoDB using basho_bench and a few different usage patterns. All of these comparisons were performed on the exact same machine, a fairly basic 2-CPU Linux server with 4G of RAM, mid-range SATA disks, and so on — a fairly typical commodity system. Note that this set of tests are not intended to provide useful absolute numbers for either database, but rather to allow some preliminary comparisons between the two. We tried to be as fair as possible. For instance, InnoDB was given an independent disk for its journaling.

The first comparison was a sequential load into an empty database. We inserted one hundred million items with numerically-sorted keys, using fairly small values of 100 bytes per item.

The database created by this insert test was used as the starting point for all subsequent tests. Each subsequent test was run in steady-state for one hour on each of the two databases. Longer runs will be important for us to gauge stability, but an hour per test seemed like a useful starting point.

For the second comparison, we did a read-only scenario with a pareto distribution. This means that a minority of the items in the database would see the vast majority of requests, which means that there will be relatively high churn but also a higher percentage of cache hits in a typical system.

The third comparison used exactly the same pareto pattern for key distribution, but instead of being pure reads it was a 90/10 read/write ratio.

The fourth comparison was intended to see how the two systems compared to each other in a very-high-churn setting. It used the same dataset, but write-only, and in an extremely narrow pareto distribution such that nearly all writes would be within a narrow set of keys, causing a relatively small number of items to be overwritten many times.

In each of these tests, LevelDB showed a higher throughput than InnoDB and a similar or lower latency than InnoDB. Our goal in this initial round was to explore basic feasibility, and that has now been established.

This exercise has not been an attempt to provide comprehensive general-purpose benchmarks for either of these two storage engines. A number of choices made do not represent any particular generic usage pattern but were instead made to quickly put the systems under stress and to minimize the number of variables being considered. There are certainly many scenarios where either of these two storage systems can certainly be made to perform differently (sometimes much better) than they did here. In some earlier tests, we saw InnoDB provide a narrower variance of latency (such as lower values in the 99th percentile) but we have not seen that reproduced in this set of tests. Among the other things not done in this quick set of tests: using the storage engines through Riak, deeply examining their I/O behavior, observing their stability over very long periods of time, comparing their response to different concurrency patterns, or comparing them to a wider range of embedded storage engines. All of these directions (and more) are good ideas for continued work in the future, and we will certainly do some of that.

Despite everything we haven’t yet done, this early work has validated one early hope and hypothesis. It appears that LevelDB may become a preferred choice for Riak users whose data set has massive numbers of keys and therefore is a poor match with Bitcask’s model. Performance aside, it compares favorably to InnoDB on other issues such as permissive license and operational usability. We are now going ahead with the work and continued testing needed to keep exploring this hypothesis and to improve both Riak and LevelDB in order to make their combined use an option for our customers and open source community.

Some issues still remain that we are working to resolve before LevelDB can be a first-class storage engine under Riak. One such issue that we are working on (with the LevelDB maintainers at Google) is making the LevelDB code portable to all of the same platforms that Riak is supported on. We are confident that these issues will be resolved in the very near future. Accordingly, we are moving ahead with a backend enabling Riak’s use of LevelDB.

Justin

Hello, Bitcask

April 27, 2010

because you needed another local key/value store

One aspect of Riak that has helped development to move so quickly is pluggable per-node storage. By allowing nearly anything k/v-shaped to be used for actual persistence, progress on storage engines can occur in parallel with progress on the higher-level parts of the system.

Many such local key/value stores already exist, such as Berkeley DB, Tokyo Cabinet, and Innostore.

There are many goals we sought when evaluating which storage engines to use in Riak, including:

  • low latency per item read or written
  • high throughput, especially when writing an incoming stream of random items
  • ability to handle datasets much larger than RAM w/o degradation
  • crash friendliness, both in terms of fast recovery and not losing data
  • ease of backup and restore
  • a relatively simple, understandable (and thus supportable) code
    structure and data format
  • predictable behavior under heavy access load or large volume
  • a license that allowed for easy default use in Riak

Achieving some of these is easy. Achieving them all is less so.

None of the local key/value storage systems available (including but not limited to those written by us) were ideal with regard to all of the above goals. We were discussing this issue with Eric Brewer when he had a key insight about hash table log merging: that doing so could potentially be made as fast or faster than LSM-trees.

This led us to explore some of the techniques used in the log-structured file systems first developed in the 1980s and 1990s in a new light. That exploration led to the development of bitcask, a storage system that meets all of the above goals very well. While bitcask was originally developed with a goal of being used under Riak, it was also built to be generic and can serve as a local key/value store for other applications as well.

If you would like to read a bit about how it works, we’ve produced a short note describing bitcask’s design that should give you a taste. Very soon you should be able to expect a Riak backend for bitcask, some improvements around startup speed, information on tuning the timing of merge and fsync operations, detailed performance analysis, and more.

In the meantime, please feel free to give it a try!

- Justin and Dizzy