October 2, 2013
What Is Riak CS?
In May of this year, we posted the top 5 questions we heard from customers and our community about Riak CS; today we’ll take a deeper dive into the technical details, specifically the differences between Riak CS and Riak itself.
Riak CS as Compared to Riak
Both Riak CS and Riak are, at their core, places to store objects. Both are open source and both are designed to be used in a cluster of servers for availability and scalability.
The fundamental distinction between the two is simple: Riak CS can be used for storing very large objects, into the terabyte size range, while Riak is optimized for fast storage and retrieval of small objects (typically no more than a few megabytes).
There are subtle differences; however, that can be obscured by the similarities between the two.
Why Would I Use Riak CS?
Riak CS is used for a variety of reasons. Some examples:
- Private object storage services, for example for companies that want to store sensitive data behind their own firewalls.
- Large binary object storage as part of a voice or video service.
- An integrated component in an OpenStack cloud solution, storing and serving VM images on demand.
Tier 3, Yahoo! Japan, Datapipe, and Turner Broadcasting are just a few of the big names using Riak CS today.
What Does Riak CS Do That Riak Doesn’t?
Riak CS carves large objects into small chunks of data to be distributed throughout a Riak cluster and, when used with Riak CS Enterprise, synchronized with remote data centers.
Riak CS adds compatibility with Amazon’s S3 and OpenStack’s Swift APIs. These offer very different semantics than Riak, and the advanced search capabilities in Riak such as Secondary Indexes and full text search are not available using S3 or Swift clients.
We strongly advise against it, but it is possible to work with Riak’s standard APIs “under the hood” when deploying a Riak CS solution.
Work is actively underway to add a security model to Riak in the upcoming 2.0 release.
Buckets or Buckets?
Users of Riak CS store their objects in virtual containers (called buckets in Amazon S3 parlance, containers in OpenStack).
Riak also relies heavily on buckets for data storage and configuration but, despite the names, these buckets are not the same.
As an example of how this can cause confusion: the replication factor in Riak (the number of times a piece of data is stored in a cluster) is configurable per-bucket. Because Riak’s buckets do not underly the user buckets in Riak CS, this feature cannot be used to create tiered services.
Riak is designed to maximize availability; the price paid for that is delayed consistency when the network is split and clients are writing to both sides of the cluster.
Creating user accounts in Riak CS; however, led to the need for a mechanism to maintain strong consistency. If two people attempt to create user accounts with the same username on either side of a network partition, both cannot be allowed to succeed, or else a conflict will occur that is very difficult to automatically recover from.
Furthermore, user buckets in S3 (and OpenStack APIs as implemented in Riak CS) reside in a global rather than a user-specific namespace, so bucket creation must also be handled carefully.
Riak CS introduced a service named Stanchion that is designed to handle these specific requests to avoid conflicts. Stanchion is a single process running on a single Riak server (thus introducing a single point of failure for user account and bucket creation requests).
While it is possible to deploy Stanchion using common system tools to make a daemon process run in a highly available manner, Basho recommends doing so carefully and testing it thoroughly. Since the only impact of failure is to prevent user and bucket creation, it may be preferable to monitor and alert on failure. If two copies of Stanchion are running due to a network partition, its strong consistency guarantees will be lost.
With strong consistency options targeted for Riak 2.0, expect to see some changes.
Basho offers multi-datacenter replication with its Enterprise software licenses, and Riak CS Enterprise takes full advantage of that feature. Data can be written to one or more clusters in multiple data centers and be synchronized automatically between them.
There are two types of synchronization: real-time, which occurs as objects are written, and full sync, which happens on a periodic basis to compare the full contents of each cluster for any changes to be merged.
One key difference is that Riak CS maintains manifest files to track the chunks it creates, and it is these manifests that are distributed between clusters during real-time sync. The individual chunks are not synchronized until a full sync replication occurs, or until someone requests the file from a remote cluster. The manifest is made active for someone to retrieve the chunks after the original upload to the source cluster is complete.
A common mistake while installing Riak CS is to configure it using information specific to Riak rather than Riak CS. As an example, per the Riak CS installation instructions the relevant backend data store must be configured to
riak_cs_kv_multi_backend, which is forked from Riak’s
riak_kv_multi_backend. Using the latter will cause problems.
Riak (CS) Control
Exposure to Internet
Exposing any database directly to the Internet is risky. Riak, currently lacking any concept of authentication, absolutely must not be accessible to untrusted networks.
Riak CS; however, is designed with Internet access in mind. It is still advisable to place a load balancer or proxy in front of a Riak CS cluster, for example to ease cluster maintenance/upgrades and to provide a central location to log and block potentially hostile access.
Riak CS servers will still have open Riak ports that must be protected from the Internet as you would any Riak servers.
Where to Next for Riak CS?
2013 has been a big year for Riak CS: it was released as open source in the spring, with OpenStack support added this summer. Still, there is much to do.
As mentioned above, improving or replacing Stanchion is a high priority.
We will continue to expand the API coverage for Riak CS. The next major targets are the copy object operations that Amazon S3 and OpenStack Swift offer.
Compression and more granular replication controls are also under consideration for future releases.
By building Riak CS atop the most robust open source distributed database in the world, we’ve created a very operationally friendly, powerful storage solution that can evolve to meet present and future needs. Feel free to give it a try if you aren’t already using it.
If you’re interested in hearing from the engineers who’ve made this software possible (and seeing just how far a highly available data storage solution can take you), join us October 29-30th for RICON West. RICON West is where Basho brings together industry and academia to discuss the rapidly expanding world of distributed systems, including Riak and Riak CS.
June 5, 2013
To help make Riak even more accessible, we have partnered with a number of different hosting providers, consulting services, system integrators, and OEMs to help you better use Riak. You can check out all of our partners at the Partnerships Page, which highlights how we are collaborating. Below is a look at some of our wonderful partners.
For those of you in need of a hosting provider for Riak, Basho is partnered with a number of great companies to help get you deployed quickly. Partners include: Amazon Web Services, Windows Azure, Joyent, SoftLayer (which was recently acquired by IBM and also offers hosting for Riak Enterprise), and Engine Yard – which is offering 500 free hours when you sign up.
Some companies that are using Riak and Riak CS to power applications or other product offerings include Datapipe and Yahoo! Japan subsidiary, IDC Frontier. ePlus and Trifork act as resellers of Riak to help expand our global reach.
Finally, Gazzang helps to ensure that your Riak environment is secure and meets all regulations related to sensitive information.
April 11, 2013
On May 13-14, RICON East will take place in New York City – with tickets still available here. RICON is Basho’s series of distributed system conferences for developers. We first launched RICON last October at the sold out San Francisco show. This year, we have three conferences scheduled across the globe, with the first in New York.
RICON East will bring together developers, engineers, architects, and scientists to discuss Riak, as well as key emerging research areas and approaches to solving the challenges faced by the industry today.
Earlier this week, the confirmed speaker line-up was released and can be found here. Here’s a look at some of the speakers:
- Dr. Margo L. Seltzer, Professor at Harvard University
- Rich Hickey, Creator of Clojure, Datomic
- Camille Fournier, VP of Architecture at Rent the Runway
- Hilary Mason, Chief Scientist at bitly
- Theo Schlossnagle, Founder and CEO at OmniTI
- Ed Laczynski, VP of Cloud Strategy and Architecture at Datapipe
- Brian Akins, Chief Operations Engineer at Turner Broadcasting System
- Sathish Gaddipati, VP of Enterprise Data at The Weather Channel
- Michajlo Matijkiw, Senior Software Engineer at Comcast
Many Basho engineers will also be speaking throughout the conference, including: Andy Gross, Sean Cribbs, Matthew Von-Maszewski, Ryan Zezeski, and Chris Tilt.
If you still haven’t purchased your tickets, there are still some available here! Also check out some of last year’s amazing talks or reach out to Mark Phillips if you’re interested in group ticket discounts or sponsorships opportunities
See you in New York!
March 25, 2013
All of us here at Basho would like to congratulate CloudStack on their graduation from incubation status to a Top-Level Apache project. This signifies that the Project’s community and products have been well governed under the Apache Software Foundation’s meritocratic process and principles. CloudStack joined the Apache Incubator back in April 2012 and has experienced large successes since – including Datapipe’s decision to build their public cloud on CloudStack.
If you’re not familiar with CloudStack, it is used to deliver Infrastructure-as-a-Service (IaaS) cloud computing in private-cloud, public, and hybrid cloud environments. It has been proven to be both stable and highly scalable, underpinning production clouds more than 30,000 physical nodes, in geo-distributed environments.
Basho has been partnered with CloudStack since September of last year, working together to build a combined platform for compute and storage resources. For more details about how Basho and CloudStack are working together, check out the full announcement.
The Apache Software Foundation is a non-profit that provides organizational, legal, and financial support for nearly 150 open source projects and initiatives. The pragmatic Apache License makes it easy for all users, commercial and individual, to deploy Apache products. Visit their site to learn more.
Congratulations again, CloudStack. We’re excited to see what comes next!
New York City, NY. – March 20, 2013 – Today at GigaOM Structure Data 2013 in New York City, Basho, the worldwide leader in distributed database and cloud storage software, announced that Riak CS (Cloud Storage) is now available open source, significantly expanding the ease-of-access to Basho’s software for developers, enterprise architects, and IT operations professionals seeking to build public or private storage clouds. Also today, Basho announced the general availability of Riak CS v1.3, the third release of Basho’s simple, available cloud storage software.
Riak CS is a multi-tenant, distributed, S3-compatible cloud storage platform that enables enterprises and service providers to launch public or private cloud services. Built on top of Riak, the world’s most advanced, open source, distributed database, Riak CS provides horizontal scale, extreme durability and low operational overhead in a distributed object storage system. Riak CS Enterprise adds Basho’s multi-datacenter replication technology and is backed by Basho’s 24×7 support and enterprise-class service-level commitments.
Riak CS Enterprise is used by great organizations worldwide including Datapipe, Deutsche Vermögensberatung (DVAG), IDC Frontier, Rovio, and Yahoo! JAPAN.
New Features in 1.3
- Multipart Upload. Riak CS v1.3 includes a new multipart upload capability that lets users store very large files by uploading parts in parallel.
- Enhanced Control for Multi-Tenant Environments. Riak CS v1.3 introduces object access control by source IP enabling operating to restrict access to Riak buckets by IP address.
- Support for GET Range Requests. Riak CS users can now retrieve a range of bytes from a single object. This functionality is implemented in the “Range” request header of GET operations.
- Graphical Tool for Riak CS. Riak CS Control is a standalone web administration tool for user management.
Basho offers a hosted “sandbox” to test interfacing with a live implementation of Riak CS. The “sandbox” is available at https://www.riakcs.net/users/sign_in.
For more information on Riak CS Enterprise, and to request a Developers Trial License, click https://basho.com/riak-cloud-storage/.
Upcoming Riak CS Webinar and RICON EAST
Basho will host an “Introduction to Riak CS Webinar” on Tuesday, April 2. To participate in the webinar, sign up here.
Basho is hosting RICON EAST on May 13 – 14, 2013 in New York City, NY. RICON is Basho’s distributed systems conference by and for engineers, developers, scientists and architects. For ticket information on RICON East, visit http://ricon.io/east.html.
Greg Collins, president and CEO, Basho
“It has been almost one year since we first released Riak CS. In just 12 months, we have seen rapid adoption by global cloud operators, telecommunication providers and large enterprises. Over the past year, Riak CS has gained new advanced capabilities and has been battle-tested in many of our customers’ and partners’ labs. Our customers have deployed Riak CS as the object storage engine inside popular cloud computing platforms, including Apache CloudStack and OpenStack. Today, by open sourcing Riak CS, we are making it easier for users to experiment with and test Riak CS, to provide rapid product feedback, and to contribute to its future capabilities.”
Ash Yamanaka, general manager, IDC Frontier and
Shingo Saito, cloud product manager, Yahoo! JAPAN
“Basho, Yahoo! JAPAN and IDC Frontier, a member of Yahoo! JAPAN group, have a very strong and growing partnership. Today, Yahoo! JAPAN and IDC Frontier leverage Riak CS Enterprise to offer an S3-compatible public cloud storage service, as well as dedicated hosting options for our customers various applications. Yahoo! JAPAN and IDC Frontier are highly supportive of open source software and we view Basho’s announcement today as a positive move that will work to accelerate its ability to innovate and ultimately strengthen our cloud platform.”
Sameer Dholakia, group vice president and GM, Citrix Platforms Group, Citrix
“Basho clearly understands the market power of open source. Since Citrix and Basho started collaborating last year, we have seen strong enthusiasm among Citrix CloudPlatform users for Basho’s cloud object storage solution. Now, CloudStack users have easy access to Riak CS and can quickly deploy an object storage solution that features multi-tenancy and S3 compatibility. We believe that many Citrix CloudPlatform customers will also seek Riak CS Enterprise for its distributed data capabilities across multiple data centers.”
Ed Laczynski, vice president, Cloud Strategy and Architecture, Datapipe
“Datapipe is very supportive of Basho’s decision to open source portions of Riak CS. During the last six months, we have deployed Riak CS Enterprise in Datapipe’s 10gig Stratosphere cloud computing platform. Riak CS provides Datapipe and its customers with highly available, low-latency and S3-compatible storage. Datapipe’s customers will benefit as Basho’s community increasingly experiments, tests and contributes to Riak CS, ultimately speeding our access to more capabilities and higher performance.”
Simon Robinson, vice president of storage research, 451 Research
“The cloud storage market continues to accelerate as companies seek to build public and private storage clouds that mirror Amazon Web Services’ capabilities and economics. Basho, with Riak CS, already has a proven track record of successful customer public and private cloud deployments. Now, Basho is demonstrating it has confidence that the technical and business benefits of Risk CS can be accelerated even faster via the open source model.”
About Basho Technologies
Basho is a distributed systems company dedicated to making software that is always available, fault-tolerant and easy-to-operate at scale. Basho’s distributed NoSQL database, Riak, and Basho’s cloud storage software, Riak CS, are used by fast growing Web businesses and by over 25% of the Fortune 50 to power their critical Web, mobile and social applications and their public and private cloud platforms.
Riak and Riak CS are available open source. Riak Enterprise and Riak CS Enterprise offer enhanced multi-datacenter replication and 24×7 Basho support. For more information, visit basho.com.
Basho is headquartered in Cambridge, Massachusetts and has offices in London, San Francisco, Tokyo and Washington DC.
Basho Marketing Manager
March 20, 2013
Riak CS (Cloud Storage) is simple, available cloud storage software built on Riak. Basho announced today that Riak CS is now open source under the Apache 2 license. Organizations and users can now access the source code on Github and download the latest packages from the downloads page. Also, today, we announced that Riak CS Enterprise is now available as commercial licensed software, featuring multi-datacenter replication technology and 24×7 Basho customer support.
We will be hosting an introductory webcast to Riak CS on Tuesday, April 2. Sign up here.
Riak CS can be used to build private or public clouds or as reliable, available storage behind applications and platforms. Riak CS Enterprise is currently used by large corporations including Datapipe, Deutsche Vermögensberatung (DVAG), IDC Frontier, Rovio, and Yahoo! JAPAN.
Basho is a distributed systems company dedicated to making software that is available, fault-tolerant, and easy to operate at scale. Twenty-five percent of the Fortune 50 and thousands of open source users large and small run our flagship open source database, Riak. Riak CS takes distributed systems principles derived from production Riak users and applies it to the problem of large scale storage. We are excited to share this code with the world.
Riak CS features:
- Highly available, fault-tolerant storage
- Large object support
- S3-compatible API and authentication
- Multi-tenancy and per-user reporting
- Simple operational model for adding capacity
- Robust stats for monitoring and metrics
For users requiring multi-datacenter replication and enterprise-level support, Riak CS Enterprise (a commercial extension of Riak CS) is available.
Today we are also announcing several new features, available now as part of the open source edition.
- Multipart upload. Upload very large files to Riak CS as a series of parts. Parts can be between 5MB and 5GB.
- Support for GET range queries. Retrieve a range of bytes from a single object. This functionality is implemented in the “Range” request header of GET operations.
- Per-bucket policies to restrict access to buckets based on source IP.
- Riak CS Control. Riak CS Control is a standalone web administration tool for user management available on Github.
“Basho, Yahoo! JAPAN, and IDC Frontier a member of Yahoo! JAPAN group have a very strong and growing partnership. Today, Yahoo! JAPAN and IDC Frontier leverage Riak CS Enterprise to offer an S3-compatible public cloud storage service, as well as dedicated hosting options for our customers various applications. Yahoo! JAPAN and IDC Frontier are highly supportive of open source software and we view Basho’s announcement today as a positive move that will work to accelerate its ability to innovate and ultimately strengthen our cloud platform.”
– Ash Yamanaka, general manager, IDC Frontier and
– Shingo Saito, cloud product manager, Yahoo! JAPAN
“Basho clearly understands the market power of open source. Since Citrix and Basho started collaborating last year, we have seen strong enthusiasm among Citrix CloudPlatform users for Basho’s cloud object storage solution. It has also provided the Apache CloudStack community with easy access to Riak CS for multi-tenancy and S3 compatibility. With today’s announcement, Citrix CloudPlatform customers will continue to benefit from Riak CS Enterprise for its distributed data capabilities across multiple data centers.”
– Sameer Dholakia, group vice president and GM, Citrix Platforms Group, Citrix
“Over the last six months, we have deployed Riak CS Enterprise within Datapipe’s 10gig Stratosphere cloud computing platform. Riak CS provides our customers with highly available, low-latency, S3-compatible cloud object storage. Datapipe is very supportive of Basho’s decision to open source portions of Riak CS. As Basho’s open source community grows, experiments, tests and contributes to Riak CS, Datapipe clients will benefit from access to additional capabilities and higher performance.”
– Ed Laczynski, vice president, Cloud Strategy and Architecture, Datapipe
Please join us for an introductory technical webcast to Riak CS on April 2. You can also read a technical overview on our website and find full documentation here.
In the coming weeks and months, we look forward to helping new users get started with Riak CS and be successful running it in production. We’ll be expanding integration and partnerships with open source cloud computing platforms in order to provide integrated storage and compute to the marketplace. As always, we’ll be listening to feedback, engaging with the community, and accepting pull requests.
February 11, 2013
We are excited to announce Datapipe’s Stratosphere, a globally available, high-performance managed cloud computing platform, leverages Riak Cloud Storage (CS). Riak Cloud Storage provides Datapipe and its customers with highly available, low-latency and S3-compatible storage.
Datapipe offers a single provider solution for managing and securing mission critical IT services, including cloud computing, infrastructure as a service, platform as a service, managed hosting, and colocation.
Stratosphere is Datapipe’s globally available managed cloud computing platform. With the launch of Riak CS to support cloud object storage, Datapipe customers can now access cloud object storage from any solution hosted with Datapipe and adjacent to existing solutions in any Datapipe data center. Stratosphere is designed for enterprise high I/O production environments and can also be used for development, testing and QA environments. Use cases include large-scale marketing campaigns, brand sites and analytics; applications with variable peak demand times and other dynamic workloads; and cloud disaster recovery and geographic redundancy.
Datapipe delivers services from the world’s most influential technical and financial markets including New York metro, Silicon Valley, London, Hong Kong and Shanghai.
Why Riak Cloud Storage at Datapipe?
Datapipe selected Riak Cloud Storage for its low-latency, highly available object storage, operational ease-of-use, and multi-site replication capabilities. After extensively testing solutions from a variety of vendors in the space, Datapipe selected Riak Cloud Storage for a few core reasons:
- Built on years of developing Riak, Riak CS is designed to provide simple, available, distributed cloud storage at any scale.
- Riak CS is compatible with major cloud object storage clients and applications with its S3-based API.
- Riak CS meets the high performance requirements of the Stratosphere cloud-computing platform.
“Riak CS provides the high-performance, distributed datastore we need to deliver a sound foundation for our cloud storage needs now and for many years into the future,” said Ed Laczynski, VP Cloud Strategy, Datapipe.
Be on the lookout for upcoming documentation about using Riak CS-backed functionality on Stratosphere at Datapipe. Riak CS is now available with Datapipe in a limited beta, with an upcoming full release.
For a developer trial of Riak CS, sign up here.