Tag Archives: Protocol Buffers

Riak 1.4: Client API Enhancements

July 22, 2013

With the release of Riak 1.4, we have made some important additions and changes to the Client API features, with a goal of strengthening the real-time, streaming, and timeout behaviors for clients. To take a deeper look at all of the changes in Riak 1.4, check out the release notes.

Protocol Buffers & Multiple Interface Binding

In previous versions of Riak, the interface binding for protocol buffers was set to a default 127.0.0.1 with a port of 8087 and the endpoint was limited to a single IP address and port. With Riak 1.4, the list of endpoints can be configured. This feature dramatically extends the options around setting up firewall security and other options at the network level, which will provide security more choice in which port ranges to close off or IP ranges to shift.

Clients can also bind to these new ranges. This gives clients the ability to use protocol buffers on more web-friendly port ranges, even utilizing protocol buffers in parallel with HTTP on port 80 if necessary. With this update, Riak now has closer feature parity between HTTP and Protocol Buffers.

Client-Specified Timeouts

Milliseconds can now be assigned to a timeout value for clients. Client-specified timeouts can be used for object manipulation around fetch, store and delete, listing buckets, or keys. This addition will be useful for asynchronous requests and pivotal for use with synchronous requests. For more on the client-specified timeouts, take a look at the relevant GitHub issue.

To explore response times and where timeout conditions can occur, check out the Basho Bench docs. There are examples for testing various scenarios and identifying bottlenecks that may need custom timeouts or performance improvements.

Bucket Properties for Protocol Buffers

Bucket properties can now be reset to default values and all built-in properties can be configured via the protocol buffers API. This helps client usage of protocol buffers in a dramatic way and, again, steps closer to feature parity with HTTP. For more information on setting and using these bucket properties, check out the bucket properties documentation.

List-Buckets Streaming: Real-time

Listing keys or buckets via a streaming request will now send bucket names to the client as received. This prevents the need to wait for all nodes to respond to the request. This update helps with response time and timeouts from the client point of view. It also allows for the use of streaming features with Node.js, C#, Java, and other languages and frameworks that support real-time streaming data feeds.

To get started, download Riak 1.4 on our docs site. Also, be sure and grab a ticket to RICON West before they sell out.

Adron

Protobuffs in Riak 1.2

July 18, 2012

You might remember that back in April, we sent around a survey to get input about what features developers use and want in Riak clients. All in all, we had about 87 developers respond to the survey.

One of the questions in that survey — and the one that was the most interesting to me — asked the respondent to rank some potential features for the roadmap. At the top of that list in the results was to support Secondary Index (2I) and Riak Search queries natively from the Protocol Buffers (Protobuffs) interface. You could already query them by sending a MapReduce request, but the additional step was confusing for some, and slow for others. I set out to make these features happen for the Riak 1.2 release.

Coupling challenges

Originally, the Protobuffs interface was created in an effort to satisfy a customer’s specific performance issue with HTTP, back around Riak version 0.10 or so. It seemed to work well for others, too, and so it got merged into the mainline. From that point until 1.0, not much was done with it. In Riak 1.0, it got a slew of new options — especially enhancments to Key-Value operations like get, put, and delete — that brought it closer to feature-parity with the HTTP interface.

Now, simply adding 2I queries to the existing system would have been straightforward, but search queries would not have been so. Why?

  • While the HTTP interface of Riak has always been built atop Webmachine, making it easy to
    add new resources as needed, the Protobuffs components were part of riak_kv. In fact, the Protobuffs interface was created while riak_search was still in its infancy, and when we had little idea what its interface would look like. Adding a coupling back the other direction (from riak_kv to riak_search) might just make the problem worse.
  • The riak-erlang-client was a dependency of riak_kv so that they could share the riakclient.proto file that contained all of the protocol message definitions. This made the Riak codebase potentially brittle to changes in the client library and made it necessary to copy the riakclient.proto file to our other clients that generate code from it.
  • We were using an antiquated version of the erlang_protobuffs library that we had forked and not kept up-to-date. The new maintainer had added features like extensions that we would like to use in the future. If I recall correctly, our version didn’t even properly support enumerations.

Refactoring

With those problems in mind and with the help of a few of my fellow Engineers, I set out to refactor the entire thing. Here’s what we came up with.

First, we separated the connection management from the message processing. This is a bit like how Webmachine works, where the accepting (mochiweb) and dispatching (webmachine) of an incoming HTTP message is separate from processing the message (your resource module and the decision graph). The result of our refactoring is the new riak_api OTP
application. It consists of a TCP listener, server processes that are spawned for each connection, and a registration utility for defining your own message handlers which are called “services”. Here’s how riak_kv registers its services:

erlang
riak_api_pb_service:register([{riak_kv_pb_object, 3, 6}, %% ClientID stuff
{riak_kv_pb_object, 9, 14}, %% Object requests
{riak_kv_pb_bucket, 15, 22}, %% Bucket requests
{riak_kv_pb_mapred, 23, 24}, %% MapReduce requests
{riak_kv_pb_index, 25, 26} %% Secondary index requests
])

Each service, represented as a module that implements the riak_api_pb_service behaviour, specifies a range of message codes it can handle. When an incoming message with a registered message code is received, it is dispatched to the corresponding service module, which can then do some processing and decide what messages to send back to the client.

Second, we separated the Protobuffs message definitions from the Erlang client library. We put the .proto file in a new library application called riak_pb, and actually split it out into several files, grouped by the component of the server they represent; this means there’s a riak.protoriak_kv.proto, and riak_search.proto. In addition to removing the coupling between the Erlang client and the server, we now have a project whose only responsibility is to describe the messages of the protocol. It’s like the equivalent of an RFC, but in code! In the near future we will have build targets in the project that let us generate Java or Python shims from the included messages and that we can distribute as standalone .jar and .egg files.

Third, we merged upstream changes from the new erlang_protobuffs maintainer and made some updates of our own. In addition to the features like extensions, the newer version has a more complete test suite. Our own updates fixed some bugs and edge cases in the library so that we could improve the overall experience for users. For example, when encountering an unknown message field, the TCP connection will no longer close because of a decoding error; instead, the unknown field will just be ignored.

New features

Whew, that was a lot of work just to get to good stuff! With the updated code structure and a plan with how to move forward, we added two new services, one in riak_kv (supporting native 2I) and one in riak_search (supporting native search-index queries), and four new messages to riak_pb to support those services. We decided not to expose the “add to index” or “delete from index” features in riak_search because we want to take it in a direction that focuses on indexing KV data rather maintaining a separate index-management interface. If you’re already using the “search KV hook” to index your data, you’ll be fine.

Client-side support for these new requests and responses has already landed in the Ruby client and will soon be landing in JavaErlang, and Python. You can track support for the new features on our updated Client Libraries wiki page.

Roadmap

Those two new client-facing features are great, but the survey showed us a lot more about what you want and need from Riak’s interfaces. For future releases we’ll be investigating how to improve Protobuff’s error messages and support for bucket properties, how to expose bulk or asynchronous operations, and much more.

Keep using Riak and sending us great feedback!

Sean

Riak 0.10 is full of great stuff

April 23, 2010

give the people what they want

We’ve received a lot of feedback in the past few months about the ways that Riak already serves people well, and the ways that they wish it could do more for them. Our latest release is an example of our response to that feedback.

Protocol Buffers

Riak has always been accessible via a clean and easy-to-use HTTP interface. We made that choice because HTTP is unquestionably the most well-understood and well-deployed protocol for data transfer. This has paid off well by making it simple for people to use many languages to interact with Riak, to get good caching behavior, and so on. However, that interface is not optimized for maximum throughput. Each request needs to parse several unknown-length headers, for example, which imposes a bit of load when you’re pointing a firehose of data into your cluster.

For those who would rather give up some of the niceties of HTTP to get a bit more speed, we have added a new client-facing interface to Riak. That interface uses the “protocol buffers” encoding scheme originally created by Google. We are beginning to roll out some client libraries with support for that new interface, starting with Python and Erlang but soon to encompass several other languages. You can expect them to trickle out over the next couple of weeks. Initial tests show a nice multiple of increased throughput on some workloads when switching to the new interface. We are likely to release some benchmarks to demonstrate this sometime soon. Give it a spin and let us know what you think.

Commit Hooks

A number of users (and a few key potential customers) have asked us how to either verify some aspects of their data (schemas, etc) on the way in to Riak, or else how to take some action (on a secondary object or otherwise) as a result of having stored it. Basically, people seem to want stored procedures.

Okay, you can have them.

Much like with our map/reduce functionality, your own functions can be expressed in either Erlang or JavaScript. As with any database’s stored procedures you should make sure to make them as simple as possible or else you might place an undue load on the cluster when trying to perform a lot of writes.

Faster Key Listings

Listing of all of the keys in a Riak bucket is fundamentally a bit more of a pain than any of the per document operations as it has to deal with and coordinate responses from many nodes. However, it doesn’t need to be all that bad.

The behavior of list_keys in Riak 0.10 is much faster than in previous releases, due both to more efficient tracking of vnode coverage and also to a much faster bloom filter. The vnode coverage aspect also makes it much more tolerant of node outages than before.

If you do use bucket key listings in your application, you should always do so in streaming mode (“keys=stream” query param if via HTTP) as doing otherwise necessitates building the entire list in memory before sending it to the client.

Cleanliness and Modularity

A lot of other work went into this release as well. The internal tracking of dependencies is much cleaner, for those of you building from source (instead of just grabbing a pre-built release). We have also broken apart the core Erlang application into two pieces. There will be more written on the reasons and benefits of that later, but for now the impact is that you probably need to make some minor changes to your configuration files when you upgrade.

All in all, we’re excited about this release and hope that you enjoy using it.

- Justin