February 19, 2014
Basho is excited to announce that Yahoo! JAPAN has launched its new cloud storage service platform for the enterprise market in Japan, powered by Riak CS. Yahoo! JAPAN is the one of the most comprehensive web portals and the most popular search engine in Japan, surpassing Google.
Riak CS is Basho’s cloud storage software that combines Amazon-class economics with the ability to customize and extend. It is built on Riak, Basho’s distributed database that is highly efficient at storing and retrieving objects, even under extreme usage or failure scenarios. For extensibility, Riak CS adds compatibility with the Amazon S3 and OpenStack Swift API.
Yahoo! JAPAN is pushing hard to be number one in transaction value across the Japanese e-commerce marketplace. LOHACO, a popular internet shopping site operated by ASKUL Corporation, has been using Basho’s Riak CS solution via Yahoo! JAPAN as backend storage for almost a year. Riak CS has proven to be a very stable system for LOHACO’s needs.
“Enterprises require storage services that are cost-effective, reliable, and scalable. Yahoo! JAPAN has been able to create a cloud storage solution to meet these needs, as well as deliver access to our large networks and server infrastructure, by developing on Riak and Riak CS,” said Shingo Saito, Cloud Product Manager at Yahoo! JAPAN. “Our partnership with Basho allows us to continue to build features that are beneficial to major enterprises and constantly improve our Cloud Storage Service by integrating with Riak, Riak CS, and future products from Basho.”
Basho is proud of Yahoo! JAPAN’s cloud service. We are excited to be a part of this great milestone.
December 30, 2013
2013 was a huge year for Basho Technologies and before we dive into 2014, we thought we’d take a moment to reflect on how far we’ve come.
2013 was the year of the Riak User. We love hearing about all the amazing ways companies across various industries are using Riak. This year, we were able to share dozens of exciting case studies. These include:
- Synacor’s TV Everywhere platform
- Enstratius (acquired by Dell)
- Best Buy
- Alert Logic
- Viggle (through OmniTI)
- Turner Broadcasting
- Hosted Graphite
- Gilt Groupe
- Praekelt Foundation
- National Health Service
- City Maps
- The Weather Company
For even more Riak Users, check out the Users Page.
We released Riak 1.3, Riak 1.4, and the Technical Preview of Riak 2.0 this year. These releases added such features as Active Anti-Entropy, revamped Riak Control, queryability improvements, Riak Data Types, and much more. Be on the lookout for the general release of Riak 2.0 early next year.
This year, we expanded RICON, Basho’s distributed systems conference, to both RICON East and RICON West. These were both sold out conferences that featured speakers from bitly, Comcast, Google, Netflix, Salesforce, The Weather Company, Turner Broadcasting, Twitter, and many more.
We drastically increased the number of Basho partners in 2013. For a full list of partners, check out the Partnerships Page. Some key ones to note include Tokyo Electron Device, SoftLayer, and Seagate.
Our amazing community team hosted over 200 meetups around the world this year. On top of that, they also attended dozens of industry events to spread the word about Basho. Keep an eye on the Events Page to see where we’ll be in 2014.
2013 was a busy year but, with some exciting announcements coming, we look forward to an even busier 2014. Happy New Year!
October 2, 2013
What Is Riak CS?
In May of this year, we posted the top 5 questions we heard from customers and our community about Riak CS; today we’ll take a deeper dive into the technical details, specifically the differences between Riak CS and Riak itself.
Riak CS as Compared to Riak
Both Riak CS and Riak are, at their core, places to store objects. Both are open source and both are designed to be used in a cluster of servers for availability and scalability.
The fundamental distinction between the two is simple: Riak CS can be used for storing very large objects, into the terabyte size range, while Riak is optimized for fast storage and retrieval of small objects (typically no more than a few megabytes).
There are subtle differences; however, that can be obscured by the similarities between the two.
Why Would I Use Riak CS?
Riak CS is used for a variety of reasons. Some examples:
- Private object storage services, for example for companies that want to store sensitive data behind their own firewalls.
- Large binary object storage as part of a voice or video service.
- An integrated component in an OpenStack cloud solution, storing and serving VM images on demand.
Tier 3, Yahoo! Japan, Datapipe, and Turner Broadcasting are just a few of the big names using Riak CS today.
What Does Riak CS Do That Riak Doesn’t?
Riak CS carves large objects into small chunks of data to be distributed throughout a Riak cluster and, when used with Riak CS Enterprise, synchronized with remote data centers.
Riak CS adds compatibility with Amazon’s S3 and OpenStack’s Swift APIs. These offer very different semantics than Riak, and the advanced search capabilities in Riak such as Secondary Indexes and full text search are not available using S3 or Swift clients.
We strongly advise against it, but it is possible to work with Riak’s standard APIs “under the hood” when deploying a Riak CS solution.
Work is actively underway to add a security model to Riak in the upcoming 2.0 release.
Buckets or Buckets?
Users of Riak CS store their objects in virtual containers (called buckets in Amazon S3 parlance, containers in OpenStack).
Riak also relies heavily on buckets for data storage and configuration but, despite the names, these buckets are not the same.
As an example of how this can cause confusion: the replication factor in Riak (the number of times a piece of data is stored in a cluster) is configurable per-bucket. Because Riak’s buckets do not underly the user buckets in Riak CS, this feature cannot be used to create tiered services.
Riak is designed to maximize availability; the price paid for that is delayed consistency when the network is split and clients are writing to both sides of the cluster.
Creating user accounts in Riak CS; however, led to the need for a mechanism to maintain strong consistency. If two people attempt to create user accounts with the same username on either side of a network partition, both cannot be allowed to succeed, or else a conflict will occur that is very difficult to automatically recover from.
Furthermore, user buckets in S3 (and OpenStack APIs as implemented in Riak CS) reside in a global rather than a user-specific namespace, so bucket creation must also be handled carefully.
Riak CS introduced a service named Stanchion that is designed to handle these specific requests to avoid conflicts. Stanchion is a single process running on a single Riak server (thus introducing a single point of failure for user account and bucket creation requests).
While it is possible to deploy Stanchion using common system tools to make a daemon process run in a highly available manner, Basho recommends doing so carefully and testing it thoroughly. Since the only impact of failure is to prevent user and bucket creation, it may be preferable to monitor and alert on failure. If two copies of Stanchion are running due to a network partition, its strong consistency guarantees will be lost.
With strong consistency options targeted for Riak 2.0, expect to see some changes.
Basho offers multi-datacenter replication with its Enterprise software licenses, and Riak CS Enterprise takes full advantage of that feature. Data can be written to one or more clusters in multiple data centers and be synchronized automatically between them.
There are two types of synchronization: real-time, which occurs as objects are written, and full sync, which happens on a periodic basis to compare the full contents of each cluster for any changes to be merged.
One key difference is that Riak CS maintains manifest files to track the chunks it creates, and it is these manifests that are distributed between clusters during real-time sync. The individual chunks are not synchronized until a full sync replication occurs, or until someone requests the file from a remote cluster. The manifest is made active for someone to retrieve the chunks after the original upload to the source cluster is complete.
A common mistake while installing Riak CS is to configure it using information specific to Riak rather than Riak CS. As an example, per the Riak CS installation instructions the relevant backend data store must be configured to
riak_cs_kv_multi_backend, which is forked from Riak’s
riak_kv_multi_backend. Using the latter will cause problems.
Riak (CS) Control
Exposure to Internet
Exposing any database directly to the Internet is risky. Riak, currently lacking any concept of authentication, absolutely must not be accessible to untrusted networks.
Riak CS; however, is designed with Internet access in mind. It is still advisable to place a load balancer or proxy in front of a Riak CS cluster, for example to ease cluster maintenance/upgrades and to provide a central location to log and block potentially hostile access.
Riak CS servers will still have open Riak ports that must be protected from the Internet as you would any Riak servers.
Where to Next for Riak CS?
2013 has been a big year for Riak CS: it was released as open source in the spring, with OpenStack support added this summer. Still, there is much to do.
As mentioned above, improving or replacing Stanchion is a high priority.
We will continue to expand the API coverage for Riak CS. The next major targets are the copy object operations that Amazon S3 and OpenStack Swift offer.
Compression and more granular replication controls are also under consideration for future releases.
By building Riak CS atop the most robust open source distributed database in the world, we’ve created a very operationally friendly, powerful storage solution that can evolve to meet present and future needs. Feel free to give it a try if you aren’t already using it.
If you’re interested in hearing from the engineers who’ve made this software possible (and seeing just how far a highly available data storage solution can take you), join us October 29-30th for RICON West. RICON West is where Basho brings together industry and academia to discuss the rapidly expanding world of distributed systems, including Riak and Riak CS.
September 30, 2013
While the biggest event of October is Basho’s distributed systems conference, RICON West, we will still be traveling the world to attend many other events this month. Here’s a look at where you can find us during the weeks leading up to RICON.
Monktoberfest: Basho’s Director of Marketing, Tyler Hannan, will be speaking at Monktoberfest on “Medieval Art, Collective Intelligence, and Language Abuse – The Ethos of Distributed Systems.” Monktoberfest will take place in Portland, ME from Oct. 3-4.
Erlang Factory Lite: Basho will have speakers at both the Chicago event (Oct. 4th) and the Berlin event (Oct. 16th). Check out talks from Chris Meiklejohn and Steve Vinoski to learn more about Riak, Erlang, and distributed systems.
CloudConnect Chicago: Basho is a sponsor and exhibitor of CloudConnect Chicago, taking place Oct. 21-23. Basho engineer, John Burwell, will also be speaking about building private clouds with Apache CloudStack and Riak CS.
O’Reilly Strata: Basho will be exhibiting and speaking at the upcoming O’Reilly Strata conference in New York from Oct. 28-30. Stop by our booth and find out why we will all be using distributed systems in the future.
September 26, 2013
Big Data. eCommerce. Mobile. Suddenly, information technology has shifted from cost center to business opportunity. This opportunity favors fast movers with the ability to rapidly execute on emerging trends. Therefore, the length of traditional IT procurement cycles and provisioning processing has become a significant barrier to capitalizing on these opportunities. To increase their operational agility, some organizations are employing public infrastructure as a service (IaaS) or cloud providers (such as Amazon Web Services and Joyent) to rapidly provision compute and storage resources. However, technical incompatibilities, regulatory restrictions, cost at scale, and/or existing capital investments prevent many organizations from utilizing public cloud providers to achieve this operational agility. Private clouds allow these organizations to realize the value of public clouds with the flexibility to comply with their unique combination business and technical requirements.
Fundamentally, a cloud (public or private) creates a composable infrastructure with the following capabilities:
- Resource Pooling: Presents compute, storage, and network resources through a unified set of vendor neutral abstractions and manages them based on service-level requirements.
- Rapid Elasticity: Optimizes resource allocation based on performance relative to service-level requirements.
- Self Service: Delegates management responsibilities for a subset of the infrastructure resources to end-users.
- Metering/Charge Back: Records resource utilization on a per customer basis to support usage billing.
Private clouds implement these characteristics by orchestrating infrastructure provisioning and management through the following services:
- Compute: Physical or virtual machines with a specified number of processing cores and RAM.
- Block Storage: Random access, read/write persistent storage capable of supporting disk partitioning and file systems.
- Object Storage: Write-once, read-many (WORM) oriented storage for large files (multiple gigabytes to terabytes in size) accessed through a key-value oriented interface.
- Network: Network topology definition and connectivity management between compute, block storage, and object storage services, as well as public networks such as the Internet.
Typically, these services are exposed via an HTTP API, as well as a web-based dashboard allowing end-users to simultaneously script complex workflows and visualize their infrastructure.
Superficially, private clouds appear to be traditional virtualization infrastructures with a web interface and HTTP API. While both models share a number of common components, cloud infrastructures achieve reliability by horizontally scaling commodity hardware instead of vertically scaling specialized hardware. The following table contrasts the storage strategies employed by the traditional virtualization and cloud models:
|Data Type||Traditional Virtualization||Cloud|
|Application Data||VM direct attached storage (e.g. NAS, SAN, etc)||Elastic database service (e.g. Riak)|
|Static Content||VM direct attached storage||Object Storage (e.g. Riak CS)|
|Templates||VM direct attached storage||Object Storage (e.g. Riak CS)|
|Backups||VM direct attached storage||Object Storage (e.g. Riak CS)|
Static content, templates, and backups typically represent the majority of a system’s storage consumption. Employing object storage to manage this data brings the following benefits to private cloud infrastructures:
- Reduced Hardware Costs: By replicating multiple copies of data across a cluster of services, object storage systems such as Riak CS guarantee data durability through software rather than hardware. This approach allows users to employ cheaper commodity hardware using ubiquitous SATA/SAS storage subsystems without sacrificing reliability.
- Horizontal Scalability: Since storage coordination and data replication occurs in software, storage is expanded by simply adding new servers to the cluster.
- Operational Simplicity: Accessed via HTTP/HTTPS, object storage systems provide secure access to data using a simple, ubiquitous protocol. Unlike iSCSI and Fiber Channel solutions, this approach typically has little to no impact on network infrastructure designs.
The Apache CloudStack IaaS platform has supported Swift-based object storage since version 4.0.0 and S3-based object storage since version 4.1.0. With the 4.2.0, CloudStack supports S3 and Swift as native secondary storage devices – allowing the system to provision and backup VMs directly from an object store. When coupled with Riak CS Enterprise, Apache CloudStack-based clouds are able to replicate template and snapshot data across multi-data centers to meet off-site backup and disaster recovery requirements.
The OpenStack Object Storage API specifies the semantics of OpenStack’s object storage service. The Swift implementation of this API is provided as the default implementation of this API. With the 1.4.0 release, Riak CS implements both the OpenStack Object Storage API allowing it to serve as a drop-in Swift replacement.
As organizations work to understand the opportunities created by information technology, private clouds have emerged as a key component of their strategies to increase operational agility. While private clouds can be constructed using traditional virtualization approaches, such designs will simply mask core infrastructure brittleness and high infrastructure costs. By embracing design principles such as object storage that underpin cloud infrastructure platforms, organizations can realize the promise of increased operational agility and cost savings.
September 24, 2013
Basho would like to congratulate Citrix on their launch of CloudPlatform 4.2
This release introduces the first integration of S3-compatible object storage into the CloudPlatform infrastructure, which provides the foundation necessary to enable a tighter integration between CloudPlatform and Riak CS. Cloud-era workloads that require object storage can easily run on CloudPlatform and have transparent access to storage across geographic and logically defined locations.
For mutual customers of Basho and Citrix, this integration can be leveraged to provide either for secondary storage for Infrastructure-as-a-Service (IaaS) components or as a method of providing Storage-as-a-service (StaaS) to their customers.
Basho remains committed to offering a holistic integration for CloudPlatform users who require the scalability, availability, and ease of operation offered by Riak CS.
For more information on the Basho and Citrix partnership (including a video recorded at a meetup), please review:
September 19, 2013
Strange Loop 2013 is currently taking over St. Louis, MO through September 20th. Strange Loop is a multi-disciplinary conference that brings together developers and thinkers to discuss technologies around emerging languages, concurrent and distributed systems, mobile development, and the web. Basho is a proud sponsor and many members of our team will be there to discuss Riak CS, our open source cloud storage software.
Garrett Eardley, Software Engineer at Riot Games, will also be presenting on how Riot Games is leveraging Riak for their next generation stats system. His talk, “Tracking Millions of Ganks in Near Real Time,” will discuss why they chose to use Riak (and move from their existing MySQL architecture), how they structure their data model and indexes, and their strategies for working with eventually consistent data. His talk will take place today, September 19th, at 9:50am.
Stop by the Basho table to grab some swag and to learn more about distributed systems.
September 18, 2013
The other day you heard about a cool new object storage solution, Riak CS, with an Amazon S3-compatible API. You starred the repository on GitHub so that you could easily find it on another day when there’s more time to play.
That day is today.
(If you haven’t heard about the cool new object storage solution called Riak CS, today is your lucky day.)
You download and install the Riak and Riak CS packages for your operating system and dig into the configuration files. For Riak CS, the configuration files live in a file named
As you skim through the default settings and the comments that surround them, something stands out. The default for
cs_root_host is set to
s3.amazonaws.com. Before reading the comments, your mind begins to speculate, “Does Riak CS talk to S3? I thought this was meant to replace Amazon S3!”
Good news: Riak CS doesn’t talk to S3.
Instead, this configuration item makes it possible to direct Amazon S3 clients to your Riak CS installation, even if they weren’t designed to support an S3-compatible alternative.
Proxy Configuration for S3 Clients
Ideally, your client does support alternatives to S3. If so, skip to the “Direct Configuration for S3 Clients” section below. However, if you’re not so lucky, read on.
A proxy configuration allows S3 clients to communicate with Riak CS as if it were Amazon S3. When configuring these clients, you’ll need:
portof your Riak CS cluster, configured under your client’s proxy settings
- The Riak CS user credentials (
When requests from this client hit Riak CS, they are processed and returned to the client as if they were serviced by S3.
Note that in this scenario, URLs returned from Riak CS will contain
s3.amazonaws.com. Also, several S3 clients only allow you to set one proxy per client. Both of these issues make things difficult if you’re trying to link users to objects stored in Riak CS, or if you want to interact with Riak CS and S3 simultaneously from the same client.
Direct Configuration for S3 Clients
A direct configuration requires that the client has support for interacting with an S3-compatible service. This boils down to a client that allows you to alter the endpoint of the storage service you want to use.
Examples of clients that allow you to do this:
There is no S3 trickery in this scenario. The client connects directly to Riak CS without any proxies. To make this work, the value for
cs_root_host needs to change to the fully qualified domain name (FQDN) of your Riak CS cluster.
Also, since S3 uses a subdomain to identify buckets created within it, in the spirit of S3-compatibility, Riak CS does too. In order to make this work in your environment, you will need a wildcard DNS entry. This is typically hosted beneath a Riak CS-specific subdomain. If you use
storage.example.com as your cluster name, you’ll need
*.storage.example.com defined as a DNS entry with the appropriate IP address so the S3 buckets will resolve properly.
There are pros and cons to each approach. Proxy is easier to setup initially and works with a wider variety of clients. Direct requires a bit more technical expertise and works with a smaller number of clients, but allows you to rid your application of references to
Choose the one that makes the most sense for your use case. We’re just glad you chose Riak CS.
- Riak CS docs
- Riak CS code on GitHub
- Riak CS download and installation
- Riak CS configuration
- Riak mailing list for questions
August 26, 2013
Earlier this month, we announced the availability of Riak CS 1.4, which added a number of performance improvements, OpenStack integration, and simpler user management. To provide more details about what was introduced with the latest release, we also hosted a “What’s New in Riak CS 1.4” webcast.
This short webcast provides an overview of both Riak CS and Riak, and discusses what’s new in Riak CS 1.4. It also looks at the fundamental features and architecture of Riak CS, talks about the key partnerships, and discusses Riak CS Enterprise – the commercial extension of Riak CS.
You can watch the complete recording below.
You can also view the slides from this webcast here.
To get started with Riak CS, visit docs.basho.com/riakcs/latest/riakcs-downloads/ to download the latest release.
August 19, 2013
The OpenStack Summit takes place in Hong Kong from November 5-8th. It is a conference for developers, users, and administrators of OpenStack Cloud Software. Basho is a big supporter of the open source community and, with the added OpenStack integration available with Riak CS 1.4, we aim to make our open source cloud storage software as accessible as possible.
This OpenStack integration adds a lot of exciting possibilities to Riak CS. A few Basho engineers have submitted speaking proposals to OpenStack Summit about how the two technologies can work together.
We need your help though! Part of the presentation selection process involves community voting. You can vote for your favorites now through August 25th.
Here’s a look at our submissions. Please vote for any or all of them.
“Riak CS: Coexisting with Swift” – Casey Rosenthal
Vote Here: www.openstack.org/rate/Presentation/riakcs-coexisting-with-swift
Riak CS is an open source, fault-tolerant, large object storage platform. With Keystone integration and Swift-API compatibility made available in version 1.4, Riak CS can now serve as a drop-in replacement to Swift in many deployments. When would you want to choose one versus the other?
Explore the architecture underlying Riak CS, the problems that Riak CS is trying to solve, and how these goals contrast with the architecture of Swift. OpenStack integration is a key driver for Riak CS adoption and is now part of the core commitment of the Riak CS team to open source and enterprise users alike. Learn how Riak CS is coexisting with Swift in the OpenStack ecosystem to solve large object storage and scaling problems.
“Highly Scalable Global Keystone Token Storage using Riak” – Dean Proctor
Vote Here: www.openstack.org/rate/Presentation/highly-scalable-global-keystone-token-storage-using-riak
Concurrent requests to Keystone scale with your OpenStack deployment; however, simple methods for linearly scaling Keystone request capacity do not currently exist. This issue is compounded when you attempt to unify authentication across multi-datacenter installations.
Learn how the Riak key value store can be used to provide an operationally simple, linearly scalable Keystone service with the ability to sync globally in real time.
“Using Riak CS as a Backend for Glance” – James Martin
Vote Here: www.openstack.org/rate/Presentation/using-riakcs-as-a-backend-for-glance
Glance can use a number of different methods to store VM images and snapshots, including object-stores. The image object store’s availability is critical to the functionality of OpenStack’s Nova service, and as time goes by it’s going to grow massively in scale; let’s not forget to mention how complex it can be to manage such as system. And for those interested in consistency across their OpenStack deployments, maintaining and replicating images can be a painful process. Learn how to use Riak CS as the storage backend for Glance and gain all the benefits of Riak – horizontal scalability, ease of administration, and dead-simple multi-datacenter replication.