October 2, 2013
What Is Riak CS?
In May of this year, we posted the top 5 questions we heard from customers and our community about Riak CS; today we’ll take a deeper dive into the technical details, specifically the differences between Riak CS and Riak itself.
Riak CS as Compared to Riak
Both Riak CS and Riak are, at their core, places to store objects. Both are open source and both are designed to be used in a cluster of servers for availability and scalability.
The fundamental distinction between the two is simple: Riak CS can be used for storing very large objects, into the terabyte size range, while Riak is optimized for fast storage and retrieval of small objects (typically no more than a few megabytes).
There are subtle differences; however, that can be obscured by the similarities between the two.
Why Would I Use Riak CS?
Riak CS is used for a variety of reasons. Some examples:
- Private object storage services, for example for companies that want to store sensitive data behind their own firewalls.
- Large binary object storage as part of a voice or video service.
- An integrated component in an OpenStack cloud solution, storing and serving VM images on demand.
Tier 3, Yahoo! Japan, Datapipe, and Turner Broadcasting are just a few of the big names using Riak CS today.
What Does Riak CS Do That Riak Doesn’t?
Riak CS carves large objects into small chunks of data to be distributed throughout a Riak cluster and, when used with Riak CS Enterprise, synchronized with remote data centers.
Riak CS adds compatibility with Amazon’s S3 and OpenStack’s Swift APIs. These offer very different semantics than Riak, and the advanced search capabilities in Riak such as Secondary Indexes and full text search are not available using S3 or Swift clients.
We strongly advise against it, but it is possible to work with Riak’s standard APIs “under the hood” when deploying a Riak CS solution.
Work is actively underway to add a security model to Riak in the upcoming 2.0 release.
Buckets or Buckets?
Users of Riak CS store their objects in virtual containers (called buckets in Amazon S3 parlance, containers in OpenStack).
Riak also relies heavily on buckets for data storage and configuration but, despite the names, these buckets are not the same.
As an example of how this can cause confusion: the replication factor in Riak (the number of times a piece of data is stored in a cluster) is configurable per-bucket. Because Riak’s buckets do not underly the user buckets in Riak CS, this feature cannot be used to create tiered services.
Riak is designed to maximize availability; the price paid for that is delayed consistency when the network is split and clients are writing to both sides of the cluster.
Creating user accounts in Riak CS; however, led to the need for a mechanism to maintain strong consistency. If two people attempt to create user accounts with the same username on either side of a network partition, both cannot be allowed to succeed, or else a conflict will occur that is very difficult to automatically recover from.
Furthermore, user buckets in S3 (and OpenStack APIs as implemented in Riak CS) reside in a global rather than a user-specific namespace, so bucket creation must also be handled carefully.
Riak CS introduced a service named Stanchion that is designed to handle these specific requests to avoid conflicts. Stanchion is a single process running on a single Riak server (thus introducing a single point of failure for user account and bucket creation requests).
While it is possible to deploy Stanchion using common system tools to make a daemon process run in a highly available manner, Basho recommends doing so carefully and testing it thoroughly. Since the only impact of failure is to prevent user and bucket creation, it may be preferable to monitor and alert on failure. If two copies of Stanchion are running due to a network partition, its strong consistency guarantees will be lost.
With strong consistency options targeted for Riak 2.0, expect to see some changes.
Basho offers multi-datacenter replication with its Enterprise software licenses, and Riak CS Enterprise takes full advantage of that feature. Data can be written to one or more clusters in multiple data centers and be synchronized automatically between them.
There are two types of synchronization: real-time, which occurs as objects are written, and full sync, which happens on a periodic basis to compare the full contents of each cluster for any changes to be merged.
One key difference is that Riak CS maintains manifest files to track the chunks it creates, and it is these manifests that are distributed between clusters during real-time sync. The individual chunks are not synchronized until a full sync replication occurs, or until someone requests the file from a remote cluster. The manifest is made active for someone to retrieve the chunks after the original upload to the source cluster is complete.
A common mistake while installing Riak CS is to configure it using information specific to Riak rather than Riak CS. As an example, per the Riak CS installation instructions the relevant backend data store must be configured to
riak_cs_kv_multi_backend, which is forked from Riak’s
riak_kv_multi_backend. Using the latter will cause problems.
Riak (CS) Control
Exposure to Internet
Exposing any database directly to the Internet is risky. Riak, currently lacking any concept of authentication, absolutely must not be accessible to untrusted networks.
Riak CS; however, is designed with Internet access in mind. It is still advisable to place a load balancer or proxy in front of a Riak CS cluster, for example to ease cluster maintenance/upgrades and to provide a central location to log and block potentially hostile access.
Riak CS servers will still have open Riak ports that must be protected from the Internet as you would any Riak servers.
Where to Next for Riak CS?
2013 has been a big year for Riak CS: it was released as open source in the spring, with OpenStack support added this summer. Still, there is much to do.
As mentioned above, improving or replacing Stanchion is a high priority.
We will continue to expand the API coverage for Riak CS. The next major targets are the copy object operations that Amazon S3 and OpenStack Swift offer.
Compression and more granular replication controls are also under consideration for future releases.
By building Riak CS atop the most robust open source distributed database in the world, we’ve created a very operationally friendly, powerful storage solution that can evolve to meet present and future needs. Feel free to give it a try if you aren’t already using it.
If you’re interested in hearing from the engineers who’ve made this software possible (and seeing just how far a highly available data storage solution can take you), join us October 29-30th for RICON West. RICON West is where Basho brings together industry and academia to discuss the rapidly expanding world of distributed systems, including Riak and Riak CS.
June 19, 2013
Today, Tier 3 announced the availability of their global cloud object storage product, powered by Riak CS. You can find the entirety of the release in our News Section entitled “Tier 3 Launches Global Cloud Object Storage.”
In particular, we are keenly interested in the unique geographic footprint that Tier 3 maintains. In conversations with customers, press, and analysts, we frequently hear people discussing “geo-data locality.” This phrase typically is used to express a desire to address regulatory compliance or to improve the end-customer experience through low-latency (in the case of mobile applications).
With the Tier 3 release, their geographic footprint — in addition to maximizing availability — leverages the inherent replication present in Riak CS to pre-determine the physical locations of specific data.
For geo-data locality, requests can be load balanced across geographies, with geo-based client requests directed to the appropriate datacenter. For example, US-based requests can be served out of a Tier 3 US-based datacenter, while EU-based requests can be served out of a Tier 3 European datacenter. For situations where not all data needs to be shared across all datacenters (or if certain data, such as user data, must only be stored in a specific geographic region to provide low-latency response and address privacy regulations), Riak CS Enterprise’s multi-datacenter replication can be configured on a per-bucket basis so only shared assets, popular assets, etc. are replicated.
Tier 3 Launches Global Cloud Object Storage
New Service, Powered by Riak CS Enterprise, Delivers High Availability and “Geo-Data Locality” via Automatic and Instant Replication
BELLEVUE, WA — June 19, 2013 ― Tier 3, a provider of public cloud infrastructure and cloud management tools, today announced the general availability of a new, distributed object storage service with automated data center redundancy. Powered by Basho’s Riak CS Enterprise, this service offers enterprises flexible, scalable cloud storage for files of any type and any size. Files – or “objects” – stored in Tier 3’s cloud are automatically replicated to a secondary in-country data center. This unique feature provides native “high availability” that improves performance, reliability, and ensures that critical data will never vanish into the ether.
By contrast, most existing cloud-based object storage systems require additional engineering work to enable this automated capability, a burden that many IT professionals and developers would rather not carry.
“Object storage in the cloud has proven to be a best practice for real-time data backups and archiving. With the launch of this service, Tier 3 has added a layer of enterprise-grade capabilities on object storage, with automatic redundancy to a second geographic location,” said Jared Wray, founder and CTO of Tier 3. “Partnering with Basho has allowed us to provide enterprises with a simple, scalable storage platform for their applications and data. Furthermore, this service will support data sovereignty scenarios for sensitive data.”
Tier 3 offers the complete enterprise cloud – a growing collection of integrated services that includes virtual servers running in the public cloud, cloud management functions like automation and orchestration, as well as Web Fabric, a platform as a service offering based on Cloud Foundry. The company operates nine data centers worldwide.
Object storage from Tier 3 is available immediately in its two Canadian data centers, with additional locations in the U.S. scheduled for next quarter. Rollout will extend to Tier 3 data centers in the U.K. and mainland Europe thereafter. Users simply pay monthly for the amount of storage allocated on Tier 3’s federated cloud data centers, and may scale storage up or down as desired.
The automatic replication of data to separate data center also delivers on a growing requirement for enterprise cloud deployments – geo-data locality. This feature allows enterprises to pre-determine the physical locations of specific data. This can be used not only to address regulatory compliance, but also to improve the end-customer experience through ultra low latency.
To help deliver this new service, Tier 3 chose Riak CS Enterprise from Basho Technologies. This advanced cloud storage software runs on top of Riak, a sophisticated open-source distributed database that provides extreme high availability, fault tolerance, and operational simplicity.
“Businesses are increasingly asking for options to meet their rapidly growing and diverse object storage requirements,” said Justin Sheehy, CTO of Basho. “Tier 3’s offering is powerful as it emphasizes enterprise requirements for security and resiliency while also providing the flexibility users expect from a public cloud. Basho is excited to see Riak CS’ unique multi-data center replication perform as a core component underlying Tier 3’s new global cloud object storage offering.”
Customer Demand Fuels Development of Object Storage
As cloud adoption continues apace, so too does the demand for cloud storage. In a recent research note, Gartner predicts worldwide Cloud System Infrastructure Services (IaaS) Storage for end-user spending to grow at a 31% Compound Annual Growth Rate (CAGR), from an estimated $1.7B in 2013 to a forecast $4.9B in 2017(1).
Object storage is fueling much of this growth, since it is ideal for many enterprise scenarios, including hosting of multimedia files, data back-ups, archives, transfer of large files, and much more.
Highlights of the new object storage service from Tier 3 include:
- Storage for Large Objects. Users may store any type of file in the Tier 3 cloud – images, videos, documents, database backups, archives, and more. The service supports direct uploads for files up to 5 GB, while larger files may be stored using multipart file uploads.
- Storage for Public Files, Plus Permissions to Keep Objects Private. Users may host public files with object storage – for example, images used in web applications or downloadable multimedia content. For additional security, files stored in Tier 3 are flagged as private by default.
- Geographic Redundancy & High Availability. Object storage from Tier 3 offers multi-datacenter replication natively, with high availability already built-in.
- S3 Compatibility. Users familiar with S3 may re-use existing code assets with the Tier 3 object storage service. The company’s API supports service, bucket, and object-level operations. In addition, object storage from Tier 3 is compatible with many S3 file management utilities.
- Enterprise-Grade Security, with Key-Pair Permissions. System administrators may store collections of files in buckets that are protected with unique key-pairs. Admins can simply point-and-click to assign permissions to other users as needed.
- Scalability. Object storage from Tier 3 can easily scale to any size desired.
Additional information on Tier 3’s object storage may be found at: http://www.tier3.com/products/object-storage.
(1) Gartner [Forecast: Public Cloud Services, Worldwide, 2011-2017, 1Q13 Update], [Anderson, E., Bell, W. and others] and [2013, March 26]
About Tier 3
Tier 3 is a complete cloud management platform for mid-tier to large enterprises, as well as SaaS providers. To bring even more value to customers, Tier 3 has combined elements of the traditional enterprise cloud market with those of cloud management platforms. Tier 3’s suite of cloud products and services include advanced management and orchestration enabling our customers to run workloads ranging from simple development and test environments to the most complex and demanding enterprise applications. The Company is based in Bellevue, WA, with regional presence in multiple locations in North America and Europe. www.tier3.com.
About Basho Technologies
Basho is a distributed systems company dedicated to making software that is highly available, fault-tolerant and easy-to-operate at scale. Basho’s distributed database, Riak and Basho’s cloud storage software, Riak CS, are used by fast growing Web businesses and by over 25 percent of the Fortune 50 to power their critical Web, mobile and social applications and their public and private cloud platforms.
Riak and Riak CS are available open source. Riak Enterprise and Riak CS Enterprise offer enhanced multi-datacenter replication and 24×7 Basho support. For more information, visit basho.com. Basho is headquartered in Cambridge, Massachusetts and has offices in London, San Francisco, Tokyo and Washington DC.