April 8, 2015
At the beginning of 2015, Adam Wray, our CEO and president, made a bold statement in a post entitled Basho is Back! Record Year and a Strong Start to 2015 he claimed:
At Basho we are focused on establishing product value and trust, while projecting a vision that our customers and community can invest in long term. In 2014 we built a strong foundation for growth in 2015 and beyond. This year one of our core objectives is to be seen by the marketplace as the leader in unstructured data! With this team and our product vision, I fully believe we can become the #1 NoSQL provider in the space.
In my role as the VP of Product and Marketing, I have the opportunity to shape our product based on customer and partner feedback as well as research into market direction. We are committed to providing the best multi-model solution for Big Data applications that leverage unstructured data in their active workloads. In fact, Basho has led the industry in adoption of multi-model solutions since beginning to offer key/value and object storage in 2013.
Over the last week or so you have seen us release Riak CS 2.0 and updated, Basho supported client libraries for Node.js and .NET. We will also release Riak 2.1 in the next few days with key performance enhancements. Basho is, presently, the leader in high availability and scale for distributed, active workloads and our increased focus on performance will result in performance enhancements for both Riak KV and Riak CS throughout 2015.
The updates to Riak 2.1 include numerous changes driven by our perspective on market trends and direction. Chief among these is the emphasis on performance and simplification for both developers and operations.
Enhancements to Riak 2.1 have increased write speeds by more than 2x for write-heavy workloads.
Riak 2.1 introduces the concept of “write once” buckets, buckets whose entries are intended to be written exactly once, and never updated or over-written. These write once buckets optimize Riak performance for immutable data which is a key design pattern for many Big Data applications.
The write_once property is applied to a bucket type and may only be set at bucket creation time. Once a bucket type has been set with this property and activated, the write_once property may not be modified.
This capability is extremely important for our customers, partners, and prospects who are writing and deploying IoT applications and whose data model includes immutable data workflows. We will continue to invest in performance in 2015 to drive speeds for write-heavy and other common workloads.
Basho Supported Clients
Basho has always maintained a series of supported client libraries for popular languages. With Riak 2.1, we have broadened the support by adding support for additional key languages used in the development of business applications. We are pleased to announce the inclusion of Basho-supported client libraries for Node.js and .NET. In addition, we have enhanced our support for PHP enabling easier integration for those building real-time web applications.
New Monitoring Statistics & Integrations
Once a Big Data application itself has been built, it is necessary to ensure that the cluster can be actively monitored. The addition of more than 200 supplementary Riak statistics enables fine-grained monitoring of individual node and cluster health. For example, you can monitor statistics for each Riak Data Type (CRDTs) measuring Get, Put, Update and Merge times at multiple percentiles. In addition, you can measure index and query latency alongside throughput for Riak Search (Solr). These statistics enable you to monitor the impact your application design has on the cluster. In addition, Basho has integrated these monitoring statistics with Nagios, New Relic, and Zabbix further expanding integrations with both hosted and on-premise monitoring solutions.
OS X Installers
In addition to clients and monitoring, we have invested in several new and/or updated installation options for Riak. Many application developers use OS X as their primary development machine. Basho already provides a simple project, riak-dev-cluster, for quickly getting started with a 5 node Riak Cluster. Now we are making it even easier by offering an OS X installer that lets you locally deploy a single node of Riak, for development purposes, with a series of simple clicks.
We continue our commitment to our community by working with the open-source contributors to our Chef, Puppet, and Ansible tools to ensure they are optimized for use with this release. In fact, improvements to the puppet-riak module make it one of the first to be built on Puppet 4.0, the latest release from Puppet Labs. To ensure clarity, and broader commitment to open-source development, we have arranged repositories driven by community contribution into the Basho Labs organization on Github. While our core codebase remains in the Basho organization, and undergo a rigorous review process, the Basho Labs invites community commitment and is actively monitored.
As if this wasn’t enough, we have also worked closely with Cloudsoft to release tested, optimized Riak blueprints. These blueprints enable the deployment of applications faster, and easier, across a variety of cloud service provider including AWS and SoftLayer. One-click, multiple providers.
Cloudsoft AMP blueprints are available to spin up a Riak cluster, a Riak cluster with an example application and Riak clusters in a multi-datacenter configuration.
Riak CS 2.0
It is with some pleasure that we are able to announce that Riak CS 2.0 is now generally available. This represents a major milestone in the lifecycle and development of Basho’s object storage offering. Riak provides the only true multi-model platform for the persistence and storage of a variety of unstructured data. With Riak CS 2.0, we have achieved seamless integration with the underlying Riak 2.0 codebase. This results in all the operational benefits of Riak 2.0 being included in Riak CS.
It would be remiss to not highlight that Riak CS 2.0 now provides enhanced conflict resolution that simplifies development, making it easier to reduce the likelihood of data conflicts and sibling growth in an eventually consistent system. This is achieved by leveraging the dotted version vector system introduced in Riak 2.0 enabling drastically simplified operational effort. This approach is coupled with the simplified configuration management presented initially in Riak 2.0 allowing for human-readable, and machine-parseable configuration files that are easily integrated with the orchestration tools that the enterprise prefers.
Getting started with Riak is easier than ever before thanks to the effort in simplifying the installation process for OS X. Designing and implementing a system for active workloads, whether a new design or replacement for existing infrastructure, often begins with a conversation with a member of our Solution Architecture team. They are available for onsite or remote discussions to educate your team on the practical considerations of implementing Riak for unstructured workloads and Big Data applications.
Vice President, Product & Marketing
January 27, 2014
Client libraries are essential to using Riak, and we at Basho have always been proud to have a flourishing client library ecosystem surrounding Riak. The release of Riak 2.0 has brought a variety of fundamental changes that client builders and maintainers should be aware of, including a variety of new features that clients should be equipped to utilize, such as security and Riak Data Types. Here, we’ll provide a list of some of those fundamental changes and suggest some approaches to addressing them, including examples from our official libraries.
Protocol Buffers API
While Riak continues to have a fully featured HTTP API for the sake of backwards compatibility, we do not recommend that you use it to build new client libraries. Instead, we encourage you to design clients to interact with Riak’s Protocol Buffers API, primarily because internal tests at Basho have shown performance gains of 25% or more when using Protocol Buffers.
The drawback behind using Protocol Buffers is that it’s not as widely known as HTTP and has a bit of a learning curve for those who aren’t familiar with it. But the good news is both that the learning curve is worth it and that Google offers official support for C++, Java, and Python support for PBC while many other languages have strong community support.
When you start developing your client library, you’ll need to find a Protocol Buffers message generator in the language of your choice and convert a series of .proto files. Once you’ve generated all the necessary messages, you’ll need to implement a transport layer to interface with Riak. A full list of Riak-specific PBC messages can be found here. The official Python client, for example, has a single RiakPbcTransport class that handles all message building, sending, and receiving, while the official Java client takes a more piecemeal approach to message building (as shown by the FetchOperation class, which handles reads from Riak). Once the transport layer is in place, you can start building higher-level abstractions on top.
Nodes and clusters
Another thing to keep in mind when writing Riak clients is that Riak always functions as a clustered (and hence multi-node) system, and connecting clients need to be set up to interact with all nodes in a cluster on the basis of each node’s host and port.
While it’s certainly possible to build clients that are intended to interact only with a single node, this means that your client’s users will need to create their own cluster interaction logic. Life will be far easier for your client’s users if your client is able to do things like this:
- periodically ping nodes to make sure they’re still online
- recognize when nodes are no longer responding and stop sending requests to those nodes
- provide a load-balancing scheme (or multiple possible schemes) to spread interactions across nodes
In general, you should think of the cluster interaction level as a kind of stateful registry of healthy nodes. In some systems, it might also be necessary to have configurable parameters for connections to Riak, e.g. minimum and/or maximum concurrent connections.
Prior to 2.0, the location of objects in Riak was determined by bucket and key. In version 2.0, bucket types were introduced as a third namespacing layer in addition to buckets and keys. Connecting clients now need to either specify a bucket type or use the default type for all K/V operations. Although creating, listing, modifying, and activating bucket types can be accomplished only via the command line, your client should provide an interface for seeing which bucket properties are associated with a bucket type.
One of the changes to be aware of when building clients is that Riak has changed its querying structure to accommodate bucket types. When performing K/V operations, you now need to specify a bucket type in addition to a bucket and a key. This means that the structure of all K/V operations needs to be modified to allow for this. We’d also recommend enabling users to perform K/V operations without specifying a bucket type, in which case the default type is used. In the official Python client, for example, the following two reads are equivalent:
Dealing with objects and content types
One of the tricky things about dealing with objects in Riak is that objects can be of any data type you choose (Riak Data Types are a different matter, and covered in the section below). You can store JSON, XML, raw binaries, strings, mp3s and MPEGs (though you should probably consider Riak CS for larger files like that), and so on. While this makes Riak an extremely flexible database, it means that clients need to be able to work with a wide variety of content types.
All objects stored in Riak must have a specified content type, e.g. application/json, text/plain, application/octet-stream, etc. While a Riak client doesn’t need to be able to handle all data types, a client intended for wide use should be able to handle at least the following:
- plain text
You should also strongly consider building automatic type handling into your client. When the official Ruby and Python clients, for example, read JSON from Riak, they automatically convert it to hashes and dicts (respectively). The Java client, to give another example, automatically converts POJOs to JSON by default and enables you to automatically convert stored JSON to custom POJO classes when fetching objects, which enables you to easily interact with Riak in a type-specific way. If you’re writing a client in a language with strong type safety, this would be a good thing to offer users.
Another important thing to bear in mind: all of your client interactions with Riak should be UTF-8 compliant, not just for the data stored in objects but also for things like bucket, key, and bucket type names. In other words, with your client it should be possible to store an object in the key Möbelträgerfüße in the bucket tête-à-tête.
If you’re using either Riak Data Types or Riak’s strong consistency subsystem, you don’t have to worry about siblings because those features by definition do not involve sibling creation or resolution. But many users of your client will want to use Riak as an eventually consistent system, which means that they will need to create their own conflict resolution logic.
In essence, your users’ applications need to make intelligent, use-case-specific decisions about what to do when the application is confronted with siblings. Most fundamentally, this means that your client needs to enable objects to have multiple sibling values. In the official Python client, for example, each object of class RiakObject has parameters that you’d expect, like content_type, bucket, and data, but it also has a siblings parameter that returns a list of sibling values.
In addition to enabling objects to have multiple values, we also strongly recommend providing some kind of helper logic that enables users to easily apply their own sibling resolution logic. What type of interface should be provided? That will depend heavily on the language. In a functional language, for example, that might mean enabling users to specify filtering functions that whittle the siblings down to a single “correct” value. To see conflict resolution in our official clients in action, see our tutorials for Java, Ruby, and Python.
Riak Data Types
In version 2.0, Riak added support for conflict-free replicated data types (aka CRDTs), which we call Riak Data Types. These five special Data Types—flags, registers, counters, sets, and maps—enable you to forgo things like application-side conflict resolution because Riak handles the resolution logic for you (provided that your data can be modeled as one of the five types). What separates Riak Data Types from other Riak objects is that you interact with them transactionally, meaning that changing Data Types involves sending messages to Riak about what changes should be made rather than fetching the whole object and modifying it on the client side.
This means that your client interface needs to enable users to modify the Data Types as much as they need to on the client side before committing those changes all at once to Riak. So if an application needs to add five counters to a map and remove items from three different sets within that map, it should be able to commit those changes with one message to Riak. The official Python client, for example, has a store() function that sends all client-side changes to Riak at once, plus a reload() function that fetches the current value of the type from Riak (with no regard to client-side changes).
One of the most important features introduced in Riak 2.0 is security. When enabled, all clients connecting to Riak, regardless of which security source is chosen, must communicate with Riak over a secure SSL connection rooted in an x.509-certificate-based Public Key Infrastructure (PKI). If you want your client’s users to be able to take advantage of Riak security, you’ll need to create an SSL interface. Fortunately, there are OpenSSL (and other) libraries in all major languages. To see SSL in action in our official clients, see our tutorials for Java, Ruby, Python, and Erlang.
Features That Don’t Require Client Changes
The following features that became available in Riak 2.0 shouldn’t require any changes to client libraries:
- Strong consistency — While adding strong consistency has entailed a lot of changes within Riak itself, K/V operations involving strongly consistent data function just like their eventually consistent counterparts in most respects. The one small exception is that performing object updates without first fetching the object will necessarily fail because the initial fetched obtains the object’s causal context, which is necessary for strongly consistent operations. It may be a good idea to add this requirement to your client documentation.
- New configuration system — Configuration has been drastically simplified in Riak 2.0, but these changes won’t have a direct impact on client interfaces.
- Dotted version vectors — While dotted version vectors (DVVs) are superior to the older vector clocks in preventing problems like sibling explosion, client libraries interact with DVVs just like they interact with vector clocks. In fact, our Protocol Buffers messages still use a vclock field for both vector clocks and DVVs, for the sake of backward compatibility.
How to Get Help
Building a 2.0-compliant Riak client has some non-trivial aspects but can be an exciting and rewarding project. Fortunately there are a variety of venues where you can get help, both from Basho engineers and from others in the Riak community.
For inspiration and education, the official Basho Riak clients in the GitHub repos are a good place to start. If you run into trouble, though, we highly recommend the Riak mailing list. There could very well be other client builders and maintainers working through a similar problem.
September 18, 2013
The other day you heard about a cool new object storage solution, Riak CS, with an Amazon S3-compatible API. You starred the repository on GitHub so that you could easily find it on another day when there’s more time to play.
That day is today.
(If you haven’t heard about the cool new object storage solution called Riak CS, today is your lucky day.)
You download and install the Riak and Riak CS packages for your operating system and dig into the configuration files. For Riak CS, the configuration files live in a file named
As you skim through the default settings and the comments that surround them, something stands out. The default for
cs_root_host is set to
s3.amazonaws.com. Before reading the comments, your mind begins to speculate, “Does Riak CS talk to S3? I thought this was meant to replace Amazon S3!”
Good news: Riak CS doesn’t talk to S3.
Instead, this configuration item makes it possible to direct Amazon S3 clients to your Riak CS installation, even if they weren’t designed to support an S3-compatible alternative.
Proxy Configuration for S3 Clients
Ideally, your client does support alternatives to S3. If so, skip to the “Direct Configuration for S3 Clients” section below. However, if you’re not so lucky, read on.
A proxy configuration allows S3 clients to communicate with Riak CS as if it were Amazon S3. When configuring these clients, you’ll need:
portof your Riak CS cluster, configured under your client’s proxy settings
- The Riak CS user credentials (
When requests from this client hit Riak CS, they are processed and returned to the client as if they were serviced by S3.
Note that in this scenario, URLs returned from Riak CS will contain
s3.amazonaws.com. Also, several S3 clients only allow you to set one proxy per client. Both of these issues make things difficult if you’re trying to link users to objects stored in Riak CS, or if you want to interact with Riak CS and S3 simultaneously from the same client.
Direct Configuration for S3 Clients
A direct configuration requires that the client has support for interacting with an S3-compatible service. This boils down to a client that allows you to alter the endpoint of the storage service you want to use.
Examples of clients that allow you to do this:
There is no S3 trickery in this scenario. The client connects directly to Riak CS without any proxies. To make this work, the value for
cs_root_host needs to change to the fully qualified domain name (FQDN) of your Riak CS cluster.
Also, since S3 uses a subdomain to identify buckets created within it, in the spirit of S3-compatibility, Riak CS does too. In order to make this work in your environment, you will need a wildcard DNS entry. This is typically hosted beneath a Riak CS-specific subdomain. If you use
storage.example.com as your cluster name, you’ll need
*.storage.example.com defined as a DNS entry with the appropriate IP address so the S3 buckets will resolve properly.
There are pros and cons to each approach. Proxy is easier to setup initially and works with a wider variety of clients. Direct requires a bit more technical expertise and works with a smaller number of clients, but allows you to rid your application of references to
Choose the one that makes the most sense for your use case. We’re just glad you chose Riak CS.
- Riak CS docs
- Riak CS code on GitHub
- Riak CS download and installation
- Riak CS configuration
- Riak mailing list for questions