October 2, 2013
What Is Riak CS?
In May of this year, we posted the top 5 questions we heard from customers and our community about Riak CS; today we’ll take a deeper dive into the technical details, specifically the differences between Riak CS and Riak itself.
Riak CS as Compared to Riak
Both Riak CS and Riak are, at their core, places to store objects. Both are open source and both are designed to be used in a cluster of servers for availability and scalability.
The fundamental distinction between the two is simple: Riak CS can be used for storing very large objects, into the terabyte size range, while Riak is optimized for fast storage and retrieval of small objects (typically no more than a few megabytes).
There are subtle differences; however, that can be obscured by the similarities between the two.
Why Would I Use Riak CS?
Riak CS is used for a variety of reasons. Some examples:
- Private object storage services, for example for companies that want to store sensitive data behind their own firewalls.
- Large binary object storage as part of a voice or video service.
- An integrated component in an OpenStack cloud solution, storing and serving VM images on demand.
Tier 3, Yahoo! Japan, Datapipe, and Turner Broadcasting are just a few of the big names using Riak CS today.
What Does Riak CS Do That Riak Doesn’t?
Riak CS carves large objects into small chunks of data to be distributed throughout a Riak cluster and, when used with Riak CS Enterprise, synchronized with remote data centers.
Riak CS adds compatibility with Amazon’s S3 and OpenStack’s Swift APIs. These offer very different semantics than Riak, and the advanced search capabilities in Riak such as Secondary Indexes and full text search are not available using S3 or Swift clients.
We strongly advise against it, but it is possible to work with Riak’s standard APIs “under the hood” when deploying a Riak CS solution.
Work is actively underway to add a security model to Riak in the upcoming 2.0 release.
Buckets or Buckets?
Users of Riak CS store their objects in virtual containers (called buckets in Amazon S3 parlance, containers in OpenStack).
Riak also relies heavily on buckets for data storage and configuration but, despite the names, these buckets are not the same.
As an example of how this can cause confusion: the replication factor in Riak (the number of times a piece of data is stored in a cluster) is configurable per-bucket. Because Riak’s buckets do not underly the user buckets in Riak CS, this feature cannot be used to create tiered services.
Riak is designed to maximize availability; the price paid for that is delayed consistency when the network is split and clients are writing to both sides of the cluster.
Creating user accounts in Riak CS; however, led to the need for a mechanism to maintain strong consistency. If two people attempt to create user accounts with the same username on either side of a network partition, both cannot be allowed to succeed, or else a conflict will occur that is very difficult to automatically recover from.
Furthermore, user buckets in S3 (and OpenStack APIs as implemented in Riak CS) reside in a global rather than a user-specific namespace, so bucket creation must also be handled carefully.
Riak CS introduced a service named Stanchion that is designed to handle these specific requests to avoid conflicts. Stanchion is a single process running on a single Riak server (thus introducing a single point of failure for user account and bucket creation requests).
While it is possible to deploy Stanchion using common system tools to make a daemon process run in a highly available manner, Basho recommends doing so carefully and testing it thoroughly. Since the only impact of failure is to prevent user and bucket creation, it may be preferable to monitor and alert on failure. If two copies of Stanchion are running due to a network partition, its strong consistency guarantees will be lost.
With strong consistency options targeted for Riak 2.0, expect to see some changes.
Basho offers multi-datacenter replication with its Enterprise software licenses, and Riak CS Enterprise takes full advantage of that feature. Data can be written to one or more clusters in multiple data centers and be synchronized automatically between them.
There are two types of synchronization: real-time, which occurs as objects are written, and full sync, which happens on a periodic basis to compare the full contents of each cluster for any changes to be merged.
One key difference is that Riak CS maintains manifest files to track the chunks it creates, and it is these manifests that are distributed between clusters during real-time sync. The individual chunks are not synchronized until a full sync replication occurs, or until someone requests the file from a remote cluster. The manifest is made active for someone to retrieve the chunks after the original upload to the source cluster is complete.
A common mistake while installing Riak CS is to configure it using information specific to Riak rather than Riak CS. As an example, per the Riak CS installation instructions the relevant backend data store must be configured to
riak_cs_kv_multi_backend, which is forked from Riak’s
riak_kv_multi_backend. Using the latter will cause problems.
Riak (CS) Control
Exposure to Internet
Exposing any database directly to the Internet is risky. Riak, currently lacking any concept of authentication, absolutely must not be accessible to untrusted networks.
Riak CS; however, is designed with Internet access in mind. It is still advisable to place a load balancer or proxy in front of a Riak CS cluster, for example to ease cluster maintenance/upgrades and to provide a central location to log and block potentially hostile access.
Riak CS servers will still have open Riak ports that must be protected from the Internet as you would any Riak servers.
Where to Next for Riak CS?
2013 has been a big year for Riak CS: it was released as open source in the spring, with OpenStack support added this summer. Still, there is much to do.
As mentioned above, improving or replacing Stanchion is a high priority.
We will continue to expand the API coverage for Riak CS. The next major targets are the copy object operations that Amazon S3 and OpenStack Swift offer.
Compression and more granular replication controls are also under consideration for future releases.
By building Riak CS atop the most robust open source distributed database in the world, we’ve created a very operationally friendly, powerful storage solution that can evolve to meet present and future needs. Feel free to give it a try if you aren’t already using it.
If you’re interested in hearing from the engineers who’ve made this software possible (and seeing just how far a highly available data storage solution can take you), join us October 29-30th for RICON West. RICON West is where Basho brings together industry and academia to discuss the rapidly expanding world of distributed systems, including Riak and Riak CS.
September 4, 2013
September will be a busy month for all of us at Basho. Not only is our own RICON developer conference coming up at the end of October, but we will be traveling all over the world to attend various conferences and host meetups. Here is a look at where you can find us this month.
Open Source Conference 2013 Hokkaido: Basho Engineer, Kaz Suzuki, will provide an “Introduction to Riak and Riak CS” talk during the Open Source Conference in Hokkaido, Japan on September 14th. He will also be demoing Riak CS at our booth.
Linux Cloud Open: Basho is a proud sponsor of the 2013 Linux Cloud Open conference. If you are in New Orleans from September 16-18th, stop by our booth to learn more about our open source cloud storage software, Riak CS.
Strangeloop 2013: Basho is a sponsor of Strangeloop 2013, taking place September 18-20th in St. Louis. Garrett Eardley, Software Engineer at Riot Games, will also be speaking about how they use Riak at Riot Games.
Hosting & Cloud Transformation Summit 2013: Basho is sponsoring and speaking at the Hosting & Cloud Transformation Summit, hosted by 451 Research. Basho EVP and CMO, Bobby Patrick, will be speaking on a panel entitled “Profiting from Cloud Storage in an Era of Software-Defined Everything” on September 25th. Check out this panel or visit our booth from September 23-25th in Las Vegas.
For a full list of where we’ll be for the rest of the year, check out the Events Page.
August 27, 2013
If you still haven’t gotten your ticket to RICON West, make sure to grab one before the early bird sale ends on August 29th. RICON West is Basho’s distributed systems conference and will take place in San Francisco on October 29-30th.
RICON West will feature speakers that are using and researching distributed systems to solve a wide range of problems. Some highlights include:
- Jeff Dean, Google Fellow at Google
- Pat Helland, Architect at Salesforce.com
- Jeff Hodges, Distributed Systems Engineer at Twitter
- Michael Bernstein, Software Developer at Paperless Post
- Susan Potter, Lead Software Engineer at Finsignia
- Ryland Degnan and Jason Brown, both Senior Software Engineers at Netflix
- Derek Murray, Researcher at Microsoft Research
- Raja Selvaraj, Data Systems Engineering Manager, and Arvinda Gillella, SUN Architect, at The Weather Company
We will be also hosting a Riak training on October 28th, right before the conference. During this training, you’ll learn about the core principles behind Riak and how it manages to scale both performance and capacity while evenly distributing data throughout the cluster. At the end of the day, you’ll be able to create and deploy your own cluster, as well as be familiar with query patterns, data modeling, and running Riak in production.
July 15, 2013
Today, we are sending out our quarterly Riak Community Survey. This survey is to help us better understand how you’re using Riak. By understanding how Riak is being used, we can make more educated decisions about how to improve Riak in the future. We will also anonymize this data and share it with the community to provide a more holistic view of how Riak is being used.
To participate in this survey, simply click here to get started. All survey participants will receive Basho swag and a discount code for RICON West tickets. One lucky participant will be selected to receive a free RICON West ticket.
Thanks for participating in our survey and be sure to grab a RICON West ticket. Early bird prices end August 29th.
June 20, 2013
At RICON East, Basho’s developer conference, we had dozens of speakers discussing distributed systems in production. These speakers included academics and developers from all different industries.
Brian Akins, Senior Principal Architect at Turner Broadcasting, spoke on “Large Scale Data Service as a Service,” and discussed the challenges Turner faced serving data to millions of clients over HTTP for several large sites (including CNN.com and NBA.com).
Brian’s talk goes into the general architecture at Turner, the growing pains they went through, and why they ultimately decided on Riak. He also goes into details about how Riak is being used to power large events at Turner, such as the presidential election, CNN Breaking News videos, and March Madness. You can watch his talk below.
His slides are also available below:
April 30, 2013
Today we are pleased to announce the availability of Riak and Riak Enterprise on SoftLayer’s global cloud platform. Users can now easily configure and deploy Riak environments on the SoftLayer platform with a flexible, pay-as-you-go service model. The solution makes it easy for organizations to quickly deploy scalable production-grade Riak systems on-demand. The partnership accelerates the speed of developing and launching applications with Riak, provides ease of operations with scale, and enables global multi-datacenter replication.
Features of the joint offering include:
- Web-based tool to configure and deploy Riak on demand and at the click of a button
- Pay-as-you-go model providing the flexibility of monthly or annual billing and no long-term contracts
- Rapid deployment on dedicated, bare-metal servers for optimum performance
With the Riak Enterprise offering on SoftLayer, users can replicate data stored in Riak across SoftLayer’s global infrastructure. This multi-datacenter replication capability provides data locality, disaster recovery, global load balancing, and active backups. SoftLayer’s global private network allows for high-speed, secure replication between clusters.
The integrated solution provides the availability, fault tolerance, operational simplicity, and scalability of Riak combined with the flexibility, performance, and agility of SoftLayer’s on-demand infrastructure.
Bump is one of the most popular mobile apps on the market today, and is already using Riak on the SoftLayer platform. “Operational ease is key to our business success,” says Mark Smith, Operations Lead at Bump. “The combination of SoftLayer, who we already trust with our business and data, and Basho, who makes the database that we trust at scale, saves us time and effort and allows us to focus on our business, not our data infrastructure.”
SoftLayer is also sponsoring the RICON East 2013 after party. On night one of the conference, we’re renting out Hudson Terrace for a one-of-a-kind party. SoftLayer and Basho are furnishing drinks, food, and entertainment. All RICON attendees are automatically registered for the party…but, as of today, the party is open to anyone who wants to register.
April 24, 2013
Will Gage of Shopzilla presented last week on their production Riak usage at the Santa Monica Java Users’ Group. Gage, a member of the Consumer Site Engineering team, shared details on how they built various user-facing services on Riak, why it was the right tool for the job, and when you might want to use it in production. Will’s talk starts at the 49 minute mark in the video embedded below, and it’s well worth your time. In addition to offering details on data modeling for their specific use cases, he also talks about service latencies for their production applications and how the Riak community played an important role in their decision.
Mark Phillips, Basho’s Director of Technical Evangelism, also presented. His talk starts at approximately the 1:20:00 point and is entitled Riak and the Power of Distributed Systems. An excellent complement to Will’s talk, this covers Riak’s architecture at a high level, how to access it as a developer, and then ends with a few use case discussions.
If you’re interested in more talks on Riak in production and the future of Riak, make sure to grab a ticket for RICON East, happening May 13-14 in New York City. This will be two days of talks, parties, and hacking dedicated to Riak, developers, and the future of distributed systems in production.
April 11, 2013
On May 13-14, RICON East will take place in New York City – with tickets still available here. RICON is Basho’s series of distributed system conferences for developers. We first launched RICON last October at the sold out San Francisco show. This year, we have three conferences scheduled across the globe, with the first in New York.
RICON East will bring together developers, engineers, architects, and scientists to discuss Riak, as well as key emerging research areas and approaches to solving the challenges faced by the industry today.
Earlier this week, the confirmed speaker line-up was released and can be found here. Here’s a look at some of the speakers:
- Dr. Margo L. Seltzer, Professor at Harvard University
- Rich Hickey, Creator of Clojure, Datomic
- Camille Fournier, VP of Architecture at Rent the Runway
- Hilary Mason, Chief Scientist at bitly
- Theo Schlossnagle, Founder and CEO at OmniTI
- Ed Laczynski, VP of Cloud Strategy and Architecture at Datapipe
- Brian Akins, Chief Operations Engineer at Turner Broadcasting System
- Sathish Gaddipati, VP of Enterprise Data at The Weather Channel
- Michajlo Matijkiw, Senior Software Engineer at Comcast
Many Basho engineers will also be speaking throughout the conference, including: Andy Gross, Sean Cribbs, Matthew Von-Maszewski, Ryan Zezeski, and Chris Tilt.
If you still haven’t purchased your tickets, there are still some available here! Also check out some of last year’s amazing talks or reach out to Mark Phillips if you’re interested in group ticket discounts or sponsorships opportunities
See you in New York!
Over 30 speakers from bitly, Comcast, The Weather Channel, Turner Broadcasting System, Harvard University, and more to discuss the future of distributed systems.
New York City, NY – April 8, 2013 – Basho, the worldwide leader in distributed database and cloud storage software, announced today the initial speaker line up for RICON East. RICON is Basho’s global conference series that is dedicated to distributed systems and is designed by and for engineers, developers, data scientists, and architects. RICON East is being held May 13-14 in New York City, NY. Basho expects to assemble hundreds of the industry’s most influential thinkers and practitioners devoted to deploying distributed systems technologies, including NoSQL solutions and Cloud Storage.
Dr. Margo L. Seltzer, Harvard University
Rich Hickey, Creator of Clojure, Datomic
Camille Fournier, Rent the Runway
Alex Payne, Breather
Hilary Mason, bitly
Theo Schlossnagle, OmniTI
Robert Treat, OmniTI
Neha Narula, Massachusetts Institute of Technology (MIT)
Neil Conway, UC Berkeley
Kyle Kingsbury, Factual
Ed Laczynski, Datapipe
Brian Akins, Turner Broadcasting System
Sathish Gaddipati, The Weather Channel
Michajlo Matijkiw, Comcast
Mark Wunsch, Gilt Groupe
Basho engineers will be featured prominently throughout RICON East. Basho speakers include: Andy Gross, Sean Cribbs, Matthew Von-Maszewski, Ryan Zezeski, Chris Tilt.
RICON East builds on Basho’s highly successful, sold-out RICON 2012 event held Fall 2012 in San Francisco. Presentations from RICON 2012 are available to view at www.ricon2012.com.
Tickets are available online at http://ricon.io/east.html. Student discount prices are available online. For other discounts, including discounts for large groups, contact Mark Phillips at firstname.lastname@example.org.
Initial sponsors of RICON East include Fastly, Meraki, Engine Yard, Github and NoSQLWeekly. For more information on sponsorship opportunities, contact Tom Santero at email@example.com.
About Basho Technologies
Basho is a distributed systems company dedicated to making software that is highly available, fault-tolerant and easy-to-operate at scale. Basho’s distributed NoSQL database, Riak, and Basho’s cloud storage software, Riak CS, are used by fast growing Web businesses and by over 25% of the Fortune 50 to power their critical Web, mobile and social applications and their public and private cloud platforms.
Basho is headquartered in Cambridge, Massachusetts and has offices in London, San Francisco, Tokyo and Washington DC.
Last year, Basho held a widely-acclaimed conference, RICON2012, where leading technologists gave insightful talks and shared ideas about Basho’s distributed database Riak and, more broadly, the distributed systems space.
The conference will once again host developers, engineers, architects, and scientists talking about Riak as well as key emerging research areas and approaches to solving the challenges faced by the industry today. Learn how some of the smartest people in the world are solving some of the hardest problems in the world.
Early bird ticket sales have begun and talk proposals are welcomed at firstname.lastname@example.org. Please note that the deadline for CFPs is March 15th.
Watch the official RICON blog for speaker announcements.
To get a better idea of what RICON is all about, recorded talks from RICON2012 can be found on the RICON website or Vimeo. Expect to be inspired and receive a fashionable hoodie — with your Twitter/GitHub handle along the side.