January 27, 2014
Client libraries are essential to using Riak, and we at Basho have always been proud to have a flourishing client library ecosystem surrounding Riak. The release of Riak 2.0 has brought a variety of fundamental changes that client builders and maintainers should be aware of, including a variety of new features that clients should be equipped to utilize, such as security and Riak Data Types. Here, we’ll provide a list of some of those fundamental changes and suggest some approaches to addressing them, including examples from our official libraries.
Protocol Buffers API
While Riak continues to have a fully featured HTTP API for the sake of backwards compatibility, we do not recommend that you use it to build new client libraries. Instead, we encourage you to design clients to interact with Riak’s Protocol Buffers API, primarily because internal tests at Basho have shown performance gains of 25% or more when using Protocol Buffers.
The drawback behind using Protocol Buffers is that it’s not as widely known as HTTP and has a bit of a learning curve for those who aren’t familiar with it. But the good news is both that the learning curve is worth it and that Google offers official support for C++, Java, and Python support for PBC while many other languages have strong community support.
When you start developing your client library, you’ll need to find a Protocol Buffers message generator in the language of your choice and convert a series of .proto files. Once you’ve generated all the necessary messages, you’ll need to implement a transport layer to interface with Riak. A full list of Riak-specific PBC messages can be found here. The official Python client, for example, has a single RiakPbcTransport class that handles all message building, sending, and receiving, while the official Java client takes a more piecemeal approach to message building (as shown by the FetchOperation class, which handles reads from Riak). Once the transport layer is in place, you can start building higher-level abstractions on top.
Nodes and clusters
Another thing to keep in mind when writing Riak clients is that Riak always functions as a clustered (and hence multi-node) system, and connecting clients need to be set up to interact with all nodes in a cluster on the basis of each node’s host and port.
While it’s certainly possible to build clients that are intended to interact only with a single node, this means that your client’s users will need to create their own cluster interaction logic. Life will be far easier for your client’s users if your client is able to do things like this:
- periodically ping nodes to make sure they’re still online
- recognize when nodes are no longer responding and stop sending requests to those nodes
- provide a load-balancing scheme (or multiple possible schemes) to spread interactions across nodes
In general, you should think of the cluster interaction level as a kind of stateful registry of healthy nodes. In some systems, it might also be necessary to have configurable parameters for connections to Riak, e.g. minimum and/or maximum concurrent connections.
Prior to 2.0, the location of objects in Riak was determined by bucket and key. In version 2.0, bucket types were introduced as a third namespacing layer in addition to buckets and keys. Connecting clients now need to either specify a bucket type or use the default type for all K/V operations. Although creating, listing, modifying, and activating bucket types can be accomplished only via the command line, your client should provide an interface for seeing which bucket properties are associated with a bucket type.
One of the changes to be aware of when building clients is that Riak has changed its querying structure to accommodate bucket types. When performing K/V operations, you now need to specify a bucket type in addition to a bucket and a key. This means that the structure of all K/V operations needs to be modified to allow for this. We’d also recommend enabling users to perform K/V operations without specifying a bucket type, in which case the default type is used. In the official Python client, for example, the following two reads are equivalent:
Dealing with objects and content types
One of the tricky things about dealing with objects in Riak is that objects can be of any data type you choose (Riak Data Types are a different matter, and covered in the section below). You can store JSON, XML, raw binaries, strings, mp3s and MPEGs (though you should probably consider Riak CS for larger files like that), and so on. While this makes Riak an extremely flexible database, it means that clients need to be able to work with a wide variety of content types.
All objects stored in Riak must have a specified content type, e.g. application/json, text/plain, application/octet-stream, etc. While a Riak client doesn’t need to be able to handle all data types, a client intended for wide use should be able to handle at least the following:
- plain text
You should also strongly consider building automatic type handling into your client. When the official Ruby and Python clients, for example, read JSON from Riak, they automatically convert it to hashes and dicts (respectively). The Java client, to give another example, automatically converts POJOs to JSON by default and enables you to automatically convert stored JSON to custom POJO classes when fetching objects, which enables you to easily interact with Riak in a type-specific way. If you’re writing a client in a language with strong type safety, this would be a good thing to offer users.
Another important thing to bear in mind: all of your client interactions with Riak should be UTF-8 compliant, not just for the data stored in objects but also for things like bucket, key, and bucket type names. In other words, with your client it should be possible to store an object in the key Möbelträgerfüße in the bucket tête-à-tête.
If you’re using either Riak Data Types or Riak’s strong consistency subsystem, you don’t have to worry about siblings because those features by definition do not involve sibling creation or resolution. But many users of your client will want to use Riak as an eventually consistent system, which means that they will need to create their own conflict resolution logic.
In essence, your users’ applications need to make intelligent, use-case-specific decisions about what to do when the application is confronted with siblings. Most fundamentally, this means that your client needs to enable objects to have multiple sibling values. In the official Python client, for example, each object of class RiakObject has parameters that you’d expect, like content_type, bucket, and data, but it also has a siblings parameter that returns a list of sibling values.
In addition to enabling objects to have multiple values, we also strongly recommend providing some kind of helper logic that enables users to easily apply their own sibling resolution logic. What type of interface should be provided? That will depend heavily on the language. In a functional language, for example, that might mean enabling users to specify filtering functions that whittle the siblings down to a single “correct” value. To see conflict resolution in our official clients in action, see our tutorials for Java, Ruby, and Python.
Riak Data Types
In version 2.0, Riak added support for conflict-free replicated data types (aka CRDTs), which we call Riak Data Types. These five special Data Types—flags, registers, counters, sets, and maps—enable you to forgo things like application-side conflict resolution because Riak handles the resolution logic for you (provided that your data can be modeled as one of the five types). What separates Riak Data Types from other Riak objects is that you interact with them transactionally, meaning that changing Data Types involves sending messages to Riak about what changes should be made rather than fetching the whole object and modifying it on the client side.
This means that your client interface needs to enable users to modify the Data Types as much as they need to on the client side before committing those changes all at once to Riak. So if an application needs to add five counters to a map and remove items from three different sets within that map, it should be able to commit those changes with one message to Riak. The official Python client, for example, has a store() function that sends all client-side changes to Riak at once, plus a reload() function that fetches the current value of the type from Riak (with no regard to client-side changes).
One of the most important features introduced in Riak 2.0 is security. When enabled, all clients connecting to Riak, regardless of which security source is chosen, must communicate with Riak over a secure SSL connection rooted in an x.509-certificate-based Public Key Infrastructure (PKI). If you want your client’s users to be able to take advantage of Riak security, you’ll need to create an SSL interface. Fortunately, there are OpenSSL (and other) libraries in all major languages. To see SSL in action in our official clients, see our tutorials for Java, Ruby, Python, and Erlang.
Features That Don’t Require Client Changes
The following features that became available in Riak 2.0 shouldn’t require any changes to client libraries:
- Strong consistency — While adding strong consistency has entailed a lot of changes within Riak itself, K/V operations involving strongly consistent data function just like their eventually consistent counterparts in most respects. The one small exception is that performing object updates without first fetching the object will necessarily fail because the initial fetched obtains the object’s causal context, which is necessary for strongly consistent operations. It may be a good idea to add this requirement to your client documentation.
- New configuration system — Configuration has been drastically simplified in Riak 2.0, but these changes won’t have a direct impact on client interfaces.
- Dotted version vectors — While dotted version vectors (DVVs) are superior to the older vector clocks in preventing problems like sibling explosion, client libraries interact with DVVs just like they interact with vector clocks. In fact, our Protocol Buffers messages still use a vclock field for both vector clocks and DVVs, for the sake of backward compatibility.
How to Get Help
Building a 2.0-compliant Riak client has some non-trivial aspects but can be an exciting and rewarding project. Fortunately there are a variety of venues where you can get help, both from Basho engineers and from others in the Riak community.
For inspiration and education, the official Basho Riak clients in the GitHub repos are a good place to start. If you run into trouble, though, we highly recommend the Riak mailing list. There could very well be other client builders and maintainers working through a similar problem.
New executive team secured additional funding and drove strong bookings growth
BELLEVUE, Wash. – January 13, 2015 – Basho Technologies, the creator and developer of Riak®, the industry leading distributed NoSQL database, today announced record 2014 sales growth along with closure of a $25 million Series G funding round led by existing investor Georgetown Partners. The financing is being used to expand development and marketing activities.
The company achieved several critical milestones in 2014:
- Grew bookings 62 percent sequentially in Q3 and 116 percent sequentially in Q4
- Grew bookings 88 percent from second half 2013 to second half 2014
- Ended 2014 with 87 percent licensing, 13 percent professional services revenues
- Closed numerous multi-million dollar enterprise deals
- Shipped Riak 2.0
- Shipped Riak CS 1.5
- Replaced Oracle at National Health Service of UK
“The new Basho management team has made strong progress in positioning the company to capitalize on growth opportunities for solutions that enable enterprises to extract value from the massive amounts of data they generate,” said Chester Davenport, chairman of Basho Technologies and managing director of Georgetown Partners. “Riak and Riak CS software have extremely strong product roadmaps for 2015 and sales momentum is impressive. With Series G funding secured, I have confidence Basho will establish itself as a leading unstructured data solutions provider in 2015.”
In March, Basho announced Adam Wray, formerly CEO of Tier 3, as CEO and Dave McCrory, formerly of Warner Music Group and VMware, as CTO. The company also added executive leadership for product, engineering, finance and EMEA management.
Basho has been widely recognized for innovation in distributed systems since being founded in 2008 and Riak has been deployed by more than 30 percent of the Fortune 50. The company experienced a significant increase in enterprise adoption in 2014 in a variety of industries, including advertising, financial services, gaming, retail and healthcare, replacing Oracle at the United Kingdom’s National Health Service.
The Weather Company, which oversees popular brands such as The Weather Channel, weather.com, Weather Underground, Weather Central and WSI, initially selected Riak Enterprise with its Multi-Datacenter Replication capabilities while still being extremely lightweight, easy to use and simple software.
“The amount of data we collect from satellites, radars, forecast models, users and weather stations worldwide is over 20TB each day and growing quickly. This data helps us deliver the world’s most accurate weather forecast as well as deliver more severe weather alerts than anyone else, so it is absolutely mission critical and has to be available all of the time,” said Bryson Koehler, executive vice president and CITO for The Weather Company. “Riak Enterprise Software gives us the flexibility and reliability that we depend on to enable over 100,000 transactions a second with sub 20ms latency on a global basis.”
Tapjoy first deployed Riak software to guarantee performance and uptime, even with peak traffic. It found that Riak helped them keep costs down, decrease engineering complexity, and reduce operational effort due to its ease of use and general stability.
“Two years ago, we implemented Riak Enterprise Software due to its high availability, operational simplicity, and ability to scale,” said Wes Jossey, head of operations at Tapjoy. “When we began, our clusters typically moved around 40,000 operations per second at peak. Today, we now see peaks well over 250,000 operations per second, all while sustaining sub-millisecond response times and rock solid stability. Despite this massive change in growth, we still do not employ any full-time engineers to work on our Riak cluster. It’s really that easy to use.”
OpenX became a Basho customer in 2012 to address multi datacenter replication and to consolidate the number of databases it was using to support its ad trafficking system. Riak Enterprise Software meets OpenX high availability and scale objectives with its multi-data center replication achieving over a billion daily real-time ad requests from a global audience.
“Basho has established themselves as a key OpenX partner,” said Matt Davis, site reliability engineer at OpenX. “They have worked with us in true partnership fashion to keep up with our rapidly scaling business and have always addressed our concerns in a timely manner. As a supporter of both Riak software and the greater Erlang community, OpenX appreciates the strong engineering prowess at Basho.”
“Worldwide demand for NoSQL technologies is driving our growth and greatly expanding our large enterprise deployments,” said Wray. “As NoSQL moves into primetime, we’re seeing more enterprises seek solutions that address a broad range of unstructured data requirements and we expect this trend to increase rapidly. We set aggressive product and sales goals for 2014 and I couldn’t be happier with our achievements this year. We look forward to continuing this acceleration into 2015 and beyond.”
- Basho Website (http://basho.com)
- Basho Blog (http://basho.com/blog/)
- Riak® software (http://basho.com/riak/)
- Riak® CS software (http://basho.com/riak-cloud-storage/)
- Additional Resources (http://basho.com/resources/)
- Twitter: @Basho (https://twitter.com/basho)
- LinkedIn (https://www.linkedin.com/company/basho-technologies-inc)
About Basho Technologies
Basho is a distributed systems company dedicated to making software that is highly available, fault-tolerant and easy-to-operate at scale. Basho’s distributed database, Riak®, the industry leading distributed NoSQL database, and Basho’s cloud storage software, Riak® CS, are used by fast growing Web businesses and by one third of the Fortune 50 to power their critical Web, mobile and social applications, and their public and private cloud platforms.
Riak is the registered trademark of Basho Technologies, Inc. Basho is a distributed systems company dedicated to making software that is highly available, fault-tolerant and easy-to-operate at scale. Basho’s Riak software is the industry leading distributed NoSQL database software. Basho’s Riak CS software is cloud storage software used by fast growing Web businesses and by one third of the Fortune 50 to power their critical Web, mobile and social applications, and their public and private cloud platforms.
Riak software and Riak CS software are available open source. Riak Enterprise Software and Riak CS Enterprise Software offer enhanced multi-datacenter replication and 24×7 Basho support. For more information, visit basho.com.
# # #
January 6, 2015
If you have read about Riak, or seen a member of the Basho team present, you have probably heard the phrase “Your data is opaque to Riak.” While this is not, strictly, true with the inclusion of distributed Data Types in Riak 2.0, it was a phrase that hinted at the core structure of Riak itself.
Riak is a Key Value data store.
In a relational database, data is organized by tables that are separate and unique structures. Within these tables exist rows of data organized into columns. As such, interaction with the database is by retrieving or updating entire tables, individual rows, or a group of columns within a set of rows.
In contrast, Riak has a much simpler data model. An Object is both the largest and smallest element of data. As such, interaction with the database is by retrieving or modifying the entire object. There is no partial fetch or update of the data.
Keys in Riak are simply a binary value (or a string) that are used to identify Objects. The Key/Value pair (or Object) is stored in a higher level namespace called a Bucket. And, with Riak 2.0, there is an extra layer of abstraction known as Bucket Types.
This Key/Value/Bucket model enables broad flexibility in modeling the applications data domain with Riak as the data store for persistence.
Another NoSQL model that many are familiar with is the document store. Unlike the Key/Value model the data store is aware of the structure of the objects stored. These objects, or documents, are grouped into “collections” — which is analogous to a relational “table” — and the datastore provides a query mechanism to search collections for objects with particular attributes. When the data that is being persisted is easily rendered as a JSON document, a document store can seem a natural fit. Some common use cases include product catalog data and content management systems.
The Basho Docs have a lengthy tutorial entitled Using Riak as a Document Store that walks you through the process of leveraging Riak as a document store for a CMS. There are many approaches to modeling, but the tutorial demonstrates the power of Riak 2.0 features by combining the maps data type and indexing that data with Riak Search.
When the data you are persisting can be represented as JSON, and you require the ability to query the data, Riak 2.0 is an excellent solution for persisting and modeling document data. The flexibility of the Key/Value model, combined with the power of Riak Search and Riak Data Types, provide you with a highly scalable, highly available document store with rich, full-text query capabilities. In addition, the inclusion of the maps data type means that you don’t have to write complex client side resolution logic when faced with network partitions. Riak Data Types handle that conflict resolution automatically.
A scalable, available document store that is operationally simple may seem compelling enough to use Riak. But when you combine the characteristics of Riak with the multi-datacenter replication capabilities of Riak Enterprise, now you have a solution that enables you to bring your data operations closer to the end user.
Scalable, available, operationally simple, and replicated. That’s the power of using Riak as a document store.
December 30, 2014
At Basho, we are proud of our documentation. All design, updates, and edits are done with our community top of mind and we encourage community participation. Given the pace at which our documentarian expert, Luc Perkins, is updating the content, it can be easy to fall behind in reading new and updated materials. So we have a holiday gift to help you out.
Below is our Top 10 suggested New Year’s reading list.
#10 – A Migrating from an SQL Database to Riak tutorial can help prepare you as embrace a new style of development and persistence.
#7 – Strong consistency has gone from having light documentation to being one of our best-documented open-source features. Strong Consistency docs are spread across the following:
#6 – We now have client-side security docs! There’s an introductory doc that walks you a bit through how client security works in Riak as well as client-specific docs for Java, Ruby, Python, and Erlang.
#5 – A new Erlang VM Tuning doc. This is still a work in progress. As we said at the beginning, we really encourage community involvement. What tuning have you done to optimize your Erlang environment?
In addition to the above, there is new documentation on the topics below.
Drum roll please….
#1 – Riak 2.0 – if you missed this you missed a lot.
We want to thank everyone in the community who participates in making the Basho documentation the most useful set of materials possible. Remember: to submit issues is human, to submit PRs is divine.
Happy New Year!
December 23, 2014
This is a continuation in our blog series covering Riak 2.0. Below are links to the previously covered blogs and we continue with a discussion on Values, Keys, Buckets and Bucket Types.
- Riak 2.0 – New Capabilities, New Use Cases, Available for Download
- Write it like Riak; Query it Like Solr
- Riak Security 2.0: Not Just a Firewall Anymore
- Distributed Data Types – Riak 2.0
- Strong Consistency in Riak 2.0
If you have tested Riak 2.0, or used it in production, you will also be aware of a new feature called Riak Bucket Types
The Using Bucket Types documentation covers the implementation, usage, and configuration of Bucket Types in great detail. Throughout the documentation there are code samples (e.g. Using Data Types) including code for creating the bucket types associated with each individual Riak Data Types.
Bucket types are a major improvement over the older system of bucket configuration. The ability to define a bucket configuration, and then change the configuration if necessary, for entire group of buckets, is a powerful new way to consider data modeling. In addition, bucket types are more reliable as buckets that have a given type (or configuration) only have their properties change when the type is changed. Previously, it was possible to change the properties of a bucket only through client requests.
In prior versions of Riak, bucket properties were altered by clients interacting with Riak…in contrast, bucket types are an operational concept. The
riak-admin bucket-type interface enables Riak users to manage bucket configurations at an operational level, without recourse to the Riak clients.
In versions of Riak prior to 2.0, all queries were made to a bucket/key pair as in the following example:
Now in Riak 2.0 with the addition of bucket types, there is an additional namespace on top of buckets and keys. The same bucket name can be associated with completely different data if it is used in accordance with a different bucket type.
If a request is made to a bucket/key pair without a specified bucket type,
default will be used in place of a bucket type. The following request are identical.
Bucket types allow groups of buckets to share configuration details. This allows Riak users, and administrators, to manage bucket properties more efficiently than in the older configuration systems that were based on bucket properties.
For a broader discussion of application data modeling, and using Riak as a data store, the Building Applications with Riak documentation covers these concepts, and more, in great detail.
Riak 2.0 was the culmination of substantial effort by the Basho team. A particular focus was adding functionality (like Riak Search) that had been requested by customer. In addition, this release further the Basho commitment to providing a scalable and available database that has the operational simplicity for which Riak is known.
It just works.
New, enhanced database and growing number of customers highlight strong year for the company
LONDON, UK. – November 20, 2014 – Basho, the creator and developer of Riak, the industry leading distributed NoSQL database, has seen a surge in deployment and a growing customer-base in EMEA as a result of the launch of Riak 2.0, the significantly enhanced version of its flagship platform.
2014 has seen significant successes for Basho, from the release of Riak 2.0 to news that Basho technology is powering Spine 2, the electronic backbone of the NHS. Basho has also seen strong growth in its EMEA customer-base, with the company working with businesses such as bet365, one of the world’s leading online gambling groups, StatPro, the cloud-based portfolio analysis service, and EE, the largest mobile operator in the United Kingdom to address their critical unstructured data needs. Basho has increased its number of customers in EMEA by 38 percent year on year, and these customer wins have contributed to revenue growth from Q2 to Q3 in 2014, which was up 90 percent.
“Our decision to implement Riak was purely strategic. After a stringent evaluation process we decided that Basho’s flexible, scalable database was best-suited to our needs,” said Martin Davies, Chief Executive Officer, Technology at bet365. “Given the huge amount of data we process on a daily basis – from customer details to betting odds – it was imperative that we had a platform to support this. We selected Riak, and have not been disappointed with the results.”
The gaming industry is becoming increasingly complex, with customers no longer satisfied with betting on a limited selection of outcomes. Now, gaming companies must offer more than your traditional betting options. For example, during football matches, it is no longer enough to offer odds on scorer or full-time result. Instead consumers are eager to bet on everything from the number of yellow cards, to corners and amount of injury time. To offer and process these options requires a huge amount of data-crunching, and in addition to the vast number of metrics and numbers processed when taking into account everything from betting odds, bets placed and the final action on each account, such businesses require a lightning-fast database to support the deluge and prevent system crashes.
Basho’s growing stature in the gaming sector has been matched by its recent success in the telecommunications space. An increasing number of telco companies like EE are using Riak to replace existing systems and provide fault-tolerance and scalability for the future. Riak’s strength in the industry is further highlighted by the market trend towards reducing the burden of managing complex hardware environments by providing a consolidated virtualized orchestration platform to replace much of the traditional hardware.
These recent deals highlight a strong year for Basho, while the reseller partnership with Nordicmind and its upcoming Riak Nordic Roadshow demonstrate its growing success in EMEA. Success in the region is further reflected in the appointment of Emmanuel Marchal as Managing Director EMEA, who will be leading enterprise focus in EMEA, as well as the continued work with companies such as Deutsche Vermögensberatung (DVAG), Germany’s largest stand-alone financial services distributor. The financial advisors of DVAG support over 6 million customers in all questions concerning financial planning, insurance and finances.
“We knew that with the release of Riak 2.0, 2014 would be a massive year for the company,” said Adam Wray, President and Chief Executive Officer at Basho. “However, the growth in deployment and the continued success of Riak was more significant than we expected – with customers responding in kind. This year alone we have made strides in several sectors, including telco, financial, gaming and healthcare, where we have helped complete a project with the NHS that could potentially save lives. Couple this with our growing number of partners, and we can happily say that Basho is going from strength-to-strength.”
November 10, 2014
Many data needs are better served by data stores that are optimized for maximum availability and scalability — rather than optimized for consistency. For certain use cases, there are elements to the data that require strong consistency. With Riak 2.0, in addition to eventual consistency, there is now a way to enforce strong consistency when needed.
NOTE: Riak’s strong consistency feature is currently an open-source-only feature and is not yet commercially supported.
Behavioral Changes with Strong Consistency
Strongly consistent operations in Riak function much like eventually consistent operations at the application level. The core difference lies in the types of errors Riak will report to the client.
Each request to update an object (except for the initial creation) must include a context value reflecting the last time the application read it. This is the same behavior that Riak clients have always followed with version vectors and strong consistency also mandates its use. Similarly, reading data from a strongly consistent Riak bucket functions exactly like eventually consistent reads.
If that value is not provided for an update operation to an existing object, Riak will reject it. This is because the database assumes that you have not seen the current value and may not know what you’re doing.
Similarly, if that context value is out of date, Riak will also reject update operations. The client must re-read the latest value and supply an update based on that new value, with the new context.
If Riak cannot contact a majority of the servers responsible for the key, the request will fail. Ordinarily, Riak is happy to accept all operations in the interest of high availability and never dropping a write – even in the extreme case of only one server surviving data center outages.
Strong consistency also eliminates object siblings, as it is effectively impossible for the cluster to disagree on the value of an object.
When considering consistency models in an application, it is easy for the logic to quickly become daunting. This is especially true when designing a workflow that leverages both eventually and strongly consistent models. It is, therefore, easiest to begin with a simple use case.
Consider the workflow involved in storing and updating username and password data. In the case of a password update, it is necessary that — at any given time — there be exactly ONE result for a user’s password. Relatedly, it is important to ensure that an update of this value is fully atomic or user experience is substantially degraded. It would be possible to leverage Riak for all the eventually consistent elements of the application and leverage strong consistency for the username and password.
To see how eventual and strong consistency can be combined to solve business problems, let’s take a not-so-hypothetical example from the energy industry.
Imagine you’re collecting massive amounts of geological data for analysis. Each batch of data must be processed by a single instance of your application. Since this processing can take hours, days, or even weeks to complete, it’s expensive if two applications handle the same batch.
Let’s walk through the sequence of events.
- Batch of data arrives for processing.
- The batch is stored in a large object store (like, Riak CS) under a batch ID.
- The batch ID is added to a pending job list in Riak and stored as a set (one of the new Riak Data Types).
This is a classic example of eventual consistency and an illustration of the value of the new Riak Data Types introduced with Riak 2.0. Storing a new batch ID in your database should never fail, even if servers are offline. If multiple applications are adding batch IDs to the pending list at the same time, it’s perfectly reasonable for those lists to temporarily diverge, as long as they can be trivially merged later.
Let’s continue to see where strong consistency comes into play.
- A compute node becomes available to process the data.
- The compute node retrieves the pending job list and picks a batch ID.
- The compute node attempts to create a lock for that batch ID.
This is where strong consistency is required. This lock object should be created in a bucket that is managed by the new strong consistency subsystem in Riak 2.0. If someone else also grabs that batch ID and tries to create another lock object, Riak’s strong consistency logic will reject this second attempt. This compute node will just start over by grabbing a new ID.
To detect crashed jobs, the lock object should be created with basic job data, such as which compute node owns the processing job and what time it was started.
- The compute node asks Riak to add the batch ID to a different set, a running job list.
- The compute node asks Riak to remove the batch ID from the pending list.
- The job runs.
- When completed, the compute node asks Riak to add the batch ID to a completed job list.
- Riak is asked to remove the batch ID from the running list.
- The compute node deletes the lock object (or updates it to reflect the completion of the processing job).
Tradeoffs When Using Strong Consistency
- Blind updates will be rejected, so the client must read the existing value before supplying a new one (except in the case of entirely new keys).
- Write requests may be slightly slower due to coordination overhead.
- If a majority of the servers responsible for a piece of data are unavailable, write requests will fail. Read operations may fail depending on the freshness of the data that is still accessible.
- Secondary indexes (2i) are not yet supported.
- Multi-datacenter replication in Riak Enterprise is not yet supported.
- Using Strong Consistency (for developers)
- Managing Strong Consistency (for operators)
- Strong Consistency (theory & concepts)
Strong Consistency is now available with Riak 2.0. Download Riak 2.0 on our Docs Page.
In a previous post we briefly introduced Riak 2.0 data types. The addition of these distributed Data Types simplifies application development by automatically handling sibling resolution. This means developers can spend less time thinking about the complexities of vector clocks and sibling resolution and, instead, let Data Types support their applications’ data access patterns.
Understanding these data types requires a brief trip through history…
Riak 1.4 Counters
Riak 1.4 introduced counters as the first data types. Prior to 1.4 we’ve always said: “Your data is opaque to Riak,” — and it still can be — but with the addition of counters that is not longer the case. Riak knows what is stored in a counter key, and how to increment and decrement it through the counter API. It isn’t necessary to fetch, mutate, or put a counter. Instead you just incremented by 5 or decremented by 100. Vector Clocks, as discussed in the post entitled Clocks Are Bad, or, Welcome to the Wonderful World of Distributed Systems, as Riak knew how to merge concurrent writes there was never a sibling created.
Counters are very valuable, but you can not build many applications on just counters. Now, in Riak 2.0, we’ve added more data types. We believe that, with the addition of these data types you can model many applications’ data storage needs with greater simplicity, and never have to write sibling merge functions again.
What are CRDTs?
You may have heard a Basho presentation, or blog post, reference “CRDTs”. CRDT stands for (variously) Conflict-free Replicated Data Type, Convergent Replicated Data Type, Commutative Replicated Data Type, and others. The key, repeated, phrase is “Replicated Data Types”.
Replication is inherent in Riak. It is what the n-value defines. It is part of what lends to the availability and fault tolerance characteristics that Riak provides. Data Types are a common construct in computing. Sets, Bags, Lists, Registers, Maps, Counters…etc.
That leaves us to consider the “C”.
Conflict Free, or “Opaque No More”
Riak is an eventually consistent system. It leans, very much, towards the AP end of the CAP spectrum. (For more reading on the topic, the Practical Tradeoffs section of A Little Riak Book is particularly illuminating). This availability is achieved with mechanisms like sloppy quorum writes to fallback nodes. However, even without partitions and many nodes, interleaved or concurrent writes can lead to conflicts. Traditionally, Riak keeps all values and presents them to the user to resolve. The client application must have a deterministic way to resolve conflicts. It might be to pick the highest timestamp, or union all the values in a list, or something more complex. Whatever approach is chosen, it is ad-hoc, and created specifically for the data model and application at hand.
With Riak data types, there is still “conflict”. However, the resolution for that conflict is inherent and part of the data type’s design. The data types for Riak 2.0 converge automatically, at write and read time, on the server. If a client application can model its data using the data types provided, no sibling values will be seen and there is no longer a need to write ad-hoc, custom merge functions.
When modeling an applications data domain in a programming language, developers are familiar with composing state from a few primitive data types. Riak Data Types give the developer that power back and expressivity, and relieve them of the burden of design and testing deterministic merge functions. The key is that the data is no longer opaque to Riak. When the Data Types API is leveraged, Riak “knows” what type of thing is being stored and is able to perform the merge automatically.
When reading a Data Type from Riak, you will only ever see a single value. That value is still eventually consistent, but it will be as correct as it can be given the amount of entropy in the database. When the system is stable, all values will converge on a single, deterministic, correct value.
What Data Types Are Available?
Riak 2.0 includes the following Data Types:
- Counters: as in Riak 1.4
- Flags: enabled/disabled
- Sets: collections of binary values
- Registers: named Binary values with values also binary
- Maps: a collection of fields that supports the nesting of multiple Data Types
The conflict resolution, as discussed above, is intrinsic to the Data Type itself. This table provides greater detail.
|Data Type||Use Cases||Conflict Resolution Rule|
||Each actor keeps and independent count for increments and decrements. Upon merge, the pairwise maximum of any two actors will win (e.g. if one actor holds 172 and other holds 173, 173 will win upon merge)|
||Enable wins over disable|
||If an element is concurrent added and removed the add will win|
||The most chronologically recent value wins, based on timestamps|
||If a field is concurrently added, or updated and removed, the addd / update will win|
A new version of Riak, with new Data Types, allowing you to model your application in more expansive ways. Take these Data Types for a spin and be sure to let us know how you use them in your applications.
September features developer conferences, Chicago Erlang, and even an “unconference.” Take a look at where Basho will be around the U.S. this month.
Strangeloop (September 17-19 in St. Louis, MO): Strangeloop is a great opportunity to learn about emerging languages, concurrent and distributed systems, and new database technologies. Basho is attending, so tweet us @basho if you’re interested in meeting.
Analytics and Big Data Summit (September 18 in San Jose, CA at 3:05 p.m. PT): Produced by the Storage Networking Industry Association (SNIA), the Analytics and Big Data Summit brings together IT professionals to discuss how to leverage analytics, and big data applications and systems. Seema Jethani from Basho will be presenting on Optimizing Cloud Storage to Manage Big Data, which will explore different data types and storage solutions. Attendees will gain an understanding of the needs of big data storage and the current cloud storage options available to organizations.
2014 High Performance Computing for Wall Street (September 22 in New York, NY at 2:30 p.m. ET): The 11th annual HPC networking opportunity is focused on high put-through, low latency networks, data centers and lowering the costs of operations. Our director of technical marketing, Tyler Hannan, will be presenting a Code Writing Session – Architecting for Global Scale.
Chicago Erlang (September 22 in Chicago, IL): Chicago Erlang is a one-day event focused on real world applications of Erlang. At 10:40 a.m. CT, Basho’s Reid Draper will present on Building Fault Tolerant Teams at Basho during which he will explain how Basho coordinates the activities of more than 25 Erlang programmers to build Riak. Then, at 3:20 p.m. CT, Steve Vinoski from Basho will discuss Optimizing Native Code for Erlang.
REST Fest 2014 (September 25-27 in Greenville, SC): REST Fest is an “unconference” with the objective of bringing together people interested in REST, hypermedia APIs, web service APIs and related topics to share ideas, trade stories and show examples of current work. Sean Cribbs from Basho will be the opening keynote! His keynote, HTTP: The Good Parts, will explore interesting and powerful ways to enhance interaction and efficiency when developing applications. Sean will leverage his 10 years of experience as a developer to provide insight into HTTP features and how you can tap into them more declaratively.
Surge 2014 (September 24-26 in National Harbor, MD): We will be attending and sponsoring OmniTI’s scalability and performance conference, Surge. We’d love to meet and chat, so tweet us @basho if you’re attending.
Lastly, RICON 2014 is just one month away, October 28-29. Early bird prices are good through September 22. Register here.
February 5, 2014
At the recent meetup for the New York Linux Users Group (NYLUG), Basho Technical Evangelist, Tom Santero, presented “An Introduction to Basho’s Riak.” In this talk, Tom explains how Riak addresses the challenges of concurrent data storage at scale. He discusses the various design decisions, tradeoffs made, and theories at work within Riak. He also provides guidance as to how you might deploy Riak in production and why.
In addition to introducing the basics of Riak and its key/value data model, Tom presents some of the exciting features being introduced with Riak 2.0. Riak Data Types adds counters, sets, and maps to Riak – allowing for better conflict resolution. They enable developers to spend less time thinking about the complexities of vector clocks and sibling resolution and, instead, focusing on using familiar, distributed data types to support their applications’ data access patterns.
You can watch Tom’s full talk below:
For more information about Riak and how it differs from traditional databases, check out the whitepaper, “From Relational to Riak.”
To see where Basho will be presenting next, visit the Events Page.