February 23, 2015
Over the last week, for a variety of reasons, the topic of security in the NoSQL space has become a prominent news item. Chief among these reasons was the announcement of a popular NoSQL database having multiple instances exposed to the public internet. From the headlines you might think that NoSQL solutions have inherent security problems. In fact, in some cases, the discussion is positioned intentionally as a relational vs. NoSQL issue. The reality is that NoSQL is not more or less secure than a traditional RDBMS.
The Security of any component of the technology stack is both the responsibility of the vendor providing the technology and those that are deploying it. How many routers are running with the default administrative password still set? Similarly, exposing any database, regardless of type, to the public internet without taking appropriate security precautions, including user authentication and authorization, is a “bad idea.” A base level of network security is an absolute requirement when deploying any data persistence utility. For Riak this can include:
- Appropriate physical security (including policies about root access)
- Securing the epmd listener port, handoff_port listener port, and the range ports specified in the riak.conf
- Defining users and optionally, groups (using Riak Security in Riak 2.0)
- Defining an authentication source for each user
- Granting necessary permissions to each user (and/or group)
- Checking Erlang MapReduce code for invocations of Riak modules other than riak_kv_mapreduce
- Ensuring your client software passes authentication information with each request, supports HTTPS or encrypted Protocol Buffers traffic
If you enable Riak security without having an established functioning SSL connection, all request to Riak will fail because Riak security (when enabled) requires a secure SSL connection. You will need to generate SSL certificates, enable SSL, and establish a certification configuration on each node.
The security discussion does not, however, end at the network. In fact, for those who are familiar with the Open Systems Interconnection model (OSI), a 7 layer conceptual model that characterizes and standardizes the internal functions of a communication system by partitioning it into abstraction layers, (ISO 7498-1) there is a corresponding security architecture reference (ISO 7498-2)…and that is just for the network. It is necessary to take adopt a comprehensive approach to security at every layer of the application stack…including the database.
The process of securing a database, which is only a component of the application stack, requires striking a fine balance. Basho has worked with large enterprise customers to ensure that Riak’s security architecture meets the needs of their application deployments and balances the effort required with the security, or compliance, requirements demanded by some of the worlds largest deployments.
NoSQL vs. Relational Security
As enterprises continue to adopt NoSQL more broadly, the question of security will continue to be raised. The reality is simple, it is necessary to evaluate the security of the database you are exploring in the same way that you would evaluate its scalability or availability characteristics. There is nothing inherent to the NoSQL market that makes it less, or more, secure that relational databases. It is true that some relational database, by aegis of their age and maturation, have more expansive security tooling available. However, when adopting a holistic, risk-based approach to security NoSQL solutions — like Riak — are as secure as required.
Security and Compliance
A compliance checklist (be it HIPAA or PCI) details, in varying specificity, the security requirements to achieve compliance. This checklist is subsequently verified through an audit by an independent entity…as well as ongoing internal audits.
So can I use NoSQL in compliant environments?
Without question, Yes. The difficulty of achieving compliance will depend on how the database is configured, what controls it provides for authentication and authorization, and many other elements of your application stack (including physical security of the datacenter, etc). Basho customers have deployed Riak in highly regulated environments and achieved their compliance requirements.
I would encourage you, however, to realize that compliance is an event. The process of securing your application, database, datacenter, etc. is an ongoing exercise. Many, particularly those in the payments industry, refer to this as a “risk-based” approach to security vs. a “compliance-based” approach.
Security and Riak
In nearly all commercial deployments of Riak, Riak is deployed on a trusted network and unauthorized access is restricted by firewall routing rules. This is expected, this is necessary and is sufficient for many use cases (when included as part of a holistic security posture including locking down ports, reasonable policies regarding root access, etc.). Some applications need an additional layer of security to meet business or regulatory compliance requirements.
To that end, in Riak 2.0, the security store changed substantially. While you should — without question — apply network layer security on top of Riak and the systems that Riak runs upon, there are now security features built into Riak that protect Riak itself, not just its network. This includes authentication (the process of identifying a user) and authorization (verifying whether the authenticated user has access to perform the requested operation). Riak’s new security features were explicitly modeled after user- and role-based systems like PostgreSQL. This means that the basic architecture of Riak Security should be familiar to most.
In Riak, administrators can selectively control access to a wide variety of Riak functionality. Riak Security allows you to both authorize users to perform specific tasks (from standard read/write/delete operations to search queries to managing bucket types and more) and to authenticate users and clients using a variety of security mechanisms. In other words, Riak operators can now verify who a connecting client is and determine what that client is allowed to do (if anything). In addition, Riak Security in 2.0 provides four options for security sources:
- trust — Any user accessing Riak from a specified IP may perform the permitted operations
- password — Authenticate with username and password (works essentially like basic auth)
- pam — Authenticate using a pluggable authentication module (PAM)
- certificate – Authenticate using client-side certificates
More detail on the Riak 2.0 Security capabilities are presented in the Security section of the documentation, in particular the section entitled Authentication and Authorization.
With a NoSQL system that provides authentication and authorization, and a properly secured network, you have progressed a long way in reducing the risk profile of your system. The application layer, of course, must still be considered.
Relational databases are still a part of the technology stack for many companies; others are innovating and incorporating NoSQL solutions either as a replacement for or alongside existing relational databases. As a result they have simplified their deployments, enhanced their availability, and reduced their costs.
Join us for this webinar where we will look at the differences between relational databases and NoSQL databases like Riak. We will look at why companies choose Riak over a relational database. We will analyze the decision points you should consider when choosing between relational and NoSQL databases and we will look at specific use cases, review data modeling and query options.
This Webinar is being held in two time slots:
- Wednesday, March 4, 2015 8:00-9:00 AM PST (4:00-5:00 PM GMT)
- Wednesday, March 4, 2015 12:00-1:00 PM PST (3:00-4:00 PM EST)
January 27, 2014
Client libraries are essential to using Riak, and we at Basho have always been proud to have a flourishing client library ecosystem surrounding Riak. The release of Riak 2.0 has brought a variety of fundamental changes that client builders and maintainers should be aware of, including a variety of new features that clients should be equipped to utilize, such as security and Riak Data Types. Here, we’ll provide a list of some of those fundamental changes and suggest some approaches to addressing them, including examples from our official libraries.
Protocol Buffers API
While Riak continues to have a fully featured HTTP API for the sake of backwards compatibility, we do not recommend that you use it to build new client libraries. Instead, we encourage you to design clients to interact with Riak’s Protocol Buffers API, primarily because internal tests at Basho have shown performance gains of 25% or more when using Protocol Buffers.
The drawback behind using Protocol Buffers is that it’s not as widely known as HTTP and has a bit of a learning curve for those who aren’t familiar with it. But the good news is both that the learning curve is worth it and that Google offers official support for C++, Java, and Python support for PBC while many other languages have strong community support.
When you start developing your client library, you’ll need to find a Protocol Buffers message generator in the language of your choice and convert a series of .proto files. Once you’ve generated all the necessary messages, you’ll need to implement a transport layer to interface with Riak. A full list of Riak-specific PBC messages can be found here. The official Python client, for example, has a single RiakPbcTransport class that handles all message building, sending, and receiving, while the official Java client takes a more piecemeal approach to message building (as shown by the FetchOperation class, which handles reads from Riak). Once the transport layer is in place, you can start building higher-level abstractions on top.
Nodes and clusters
Another thing to keep in mind when writing Riak clients is that Riak always functions as a clustered (and hence multi-node) system, and connecting clients need to be set up to interact with all nodes in a cluster on the basis of each node’s host and port.
While it’s certainly possible to build clients that are intended to interact only with a single node, this means that your client’s users will need to create their own cluster interaction logic. Life will be far easier for your client’s users if your client is able to do things like this:
- periodically ping nodes to make sure they’re still online
- recognize when nodes are no longer responding and stop sending requests to those nodes
- provide a load-balancing scheme (or multiple possible schemes) to spread interactions across nodes
In general, you should think of the cluster interaction level as a kind of stateful registry of healthy nodes. In some systems, it might also be necessary to have configurable parameters for connections to Riak, e.g. minimum and/or maximum concurrent connections.
Prior to 2.0, the location of objects in Riak was determined by bucket and key. In version 2.0, bucket types were introduced as a third namespacing layer in addition to buckets and keys. Connecting clients now need to either specify a bucket type or use the default type for all K/V operations. Although creating, listing, modifying, and activating bucket types can be accomplished only via the command line, your client should provide an interface for seeing which bucket properties are associated with a bucket type.
One of the changes to be aware of when building clients is that Riak has changed its querying structure to accommodate bucket types. When performing K/V operations, you now need to specify a bucket type in addition to a bucket and a key. This means that the structure of all K/V operations needs to be modified to allow for this. We’d also recommend enabling users to perform K/V operations without specifying a bucket type, in which case the default type is used. In the official Python client, for example, the following two reads are equivalent:
Dealing with objects and content types
One of the tricky things about dealing with objects in Riak is that objects can be of any data type you choose (Riak Data Types are a different matter, and covered in the section below). You can store JSON, XML, raw binaries, strings, mp3s and MPEGs (though you should probably consider Riak CS for larger files like that), and so on. While this makes Riak an extremely flexible database, it means that clients need to be able to work with a wide variety of content types.
All objects stored in Riak must have a specified content type, e.g. application/json, text/plain, application/octet-stream, etc. While a Riak client doesn’t need to be able to handle all data types, a client intended for wide use should be able to handle at least the following:
- plain text
You should also strongly consider building automatic type handling into your client. When the official Ruby and Python clients, for example, read JSON from Riak, they automatically convert it to hashes and dicts (respectively). The Java client, to give another example, automatically converts POJOs to JSON by default and enables you to automatically convert stored JSON to custom POJO classes when fetching objects, which enables you to easily interact with Riak in a type-specific way. If you’re writing a client in a language with strong type safety, this would be a good thing to offer users.
Another important thing to bear in mind: all of your client interactions with Riak should be UTF-8 compliant, not just for the data stored in objects but also for things like bucket, key, and bucket type names. In other words, with your client it should be possible to store an object in the key Möbelträgerfüße in the bucket tête-à-tête.
If you’re using either Riak Data Types or Riak’s strong consistency subsystem, you don’t have to worry about siblings because those features by definition do not involve sibling creation or resolution. But many users of your client will want to use Riak as an eventually consistent system, which means that they will need to create their own conflict resolution logic.
In essence, your users’ applications need to make intelligent, use-case-specific decisions about what to do when the application is confronted with siblings. Most fundamentally, this means that your client needs to enable objects to have multiple sibling values. In the official Python client, for example, each object of class RiakObject has parameters that you’d expect, like content_type, bucket, and data, but it also has a siblings parameter that returns a list of sibling values.
In addition to enabling objects to have multiple values, we also strongly recommend providing some kind of helper logic that enables users to easily apply their own sibling resolution logic. What type of interface should be provided? That will depend heavily on the language. In a functional language, for example, that might mean enabling users to specify filtering functions that whittle the siblings down to a single “correct” value. To see conflict resolution in our official clients in action, see our tutorials for Java, Ruby, and Python.
Riak Data Types
In version 2.0, Riak added support for conflict-free replicated data types (aka CRDTs), which we call Riak Data Types. These five special Data Types—flags, registers, counters, sets, and maps—enable you to forgo things like application-side conflict resolution because Riak handles the resolution logic for you (provided that your data can be modeled as one of the five types). What separates Riak Data Types from other Riak objects is that you interact with them transactionally, meaning that changing Data Types involves sending messages to Riak about what changes should be made rather than fetching the whole object and modifying it on the client side.
This means that your client interface needs to enable users to modify the Data Types as much as they need to on the client side before committing those changes all at once to Riak. So if an application needs to add five counters to a map and remove items from three different sets within that map, it should be able to commit those changes with one message to Riak. The official Python client, for example, has a store() function that sends all client-side changes to Riak at once, plus a reload() function that fetches the current value of the type from Riak (with no regard to client-side changes).
One of the most important features introduced in Riak 2.0 is security. When enabled, all clients connecting to Riak, regardless of which security source is chosen, must communicate with Riak over a secure SSL connection rooted in an x.509-certificate-based Public Key Infrastructure (PKI). If you want your client’s users to be able to take advantage of Riak security, you’ll need to create an SSL interface. Fortunately, there are OpenSSL (and other) libraries in all major languages. To see SSL in action in our official clients, see our tutorials for Java, Ruby, Python, and Erlang.
Features That Don’t Require Client Changes
The following features that became available in Riak 2.0 shouldn’t require any changes to client libraries:
- Strong consistency — While adding strong consistency has entailed a lot of changes within Riak itself, K/V operations involving strongly consistent data function just like their eventually consistent counterparts in most respects. The one small exception is that performing object updates without first fetching the object will necessarily fail because the initial fetched obtains the object’s causal context, which is necessary for strongly consistent operations. It may be a good idea to add this requirement to your client documentation.
- New configuration system — Configuration has been drastically simplified in Riak 2.0, but these changes won’t have a direct impact on client interfaces.
- Dotted version vectors — While dotted version vectors (DVVs) are superior to the older vector clocks in preventing problems like sibling explosion, client libraries interact with DVVs just like they interact with vector clocks. In fact, our Protocol Buffers messages still use a vclock field for both vector clocks and DVVs, for the sake of backward compatibility.
How to Get Help
Building a 2.0-compliant Riak client has some non-trivial aspects but can be an exciting and rewarding project. Fortunately there are a variety of venues where you can get help, both from Basho engineers and from others in the Riak community.
For inspiration and education, the official Basho Riak clients in the GitHub repos are a good place to start. If you run into trouble, though, we highly recommend the Riak mailing list. There could very well be other client builders and maintainers working through a similar problem.
July 29, 2013
For those of you who are up on your RICON history, you’ll remember that last year, Basho Hackers Russell Brown and Sean Cribbs gave a talk called “Data Structures in Riak” (video can be viewed here). Russell and Sean detailed the approach that Basho was taking to add eventually consistent data structures to Riak. The highlight of the presentation was a demonstration of incrementing and decrementing a counter using a sample app built with riak_dt. A simple counter was incremented. During this, nodes were killed, network partitions were introduced, and despite all that, counts converged once the cluster healed.
It was one of the more memorable moments of the entire conference.
We believe developers can build robust applications utilizing a simple key/value API. GETs, PUTs, and DELETEs can work wonders when utilized correctly. But this doesn’t let you build everything on Riak, and we’ve seen a fair amount of applications that outsource things – like data type operations – to other storage or caching systems. Especially when porting apps from Redis to Riak, we often hear that counters are one feature that Riak lacks. Basho is firmly in the “right-technology-for-the-right-job” camp but we’re aggressively adding functionality that doesn’t break Riak’s design goals of availability and fault-tolerance.
As of the Riak 1.4 release, counters are officially supported. Specifically, a data type known as a PN-Counter, which can be both incremented and decremented. This is the first of a suite of data types we’re planning to add (more on this in a moment) that give developers the ability to build more complex functionality on top of data stored as keys and values.
Using counters, you can increment and decrement a count associated with a named object in a given bucket. This sounds easy, but in a system like Riak where writes aren’t serialized and all updates are asynchronous, determining the last actor in a series of updates to an object is non-trivial. Riak’s counters should be used (in their current state) to count things that can tolerate eventual consistency. With that in mind, here are a few apps and types of functionality that could be implemented with Riak’s Counters:
- Facebook Like Button
- Youtube Views and Likes
- Hacker News Votes
- Twitter Followers and Favorites
The thing to remember here is that these counts can tolerate slight, brief imprecision. When your follower count fluctuates between 1000 and 1010 before finally settling on 1009, Twitter continues to work as expected. Riak 2.0 will feature work that enables you to enforce consistency around designated buckets which will solve this problem (with the necessary tradeoffs). Until then, use counters in Riak for things that can tolerate eventual consistency.
Even with this caveat, counters are a huge addition to Riak and we’re excited to see the new suite of applications and functionality they make possible.
Usage & Getting Started
To make use of counters we’ve introduced new endpoints and request types for the HTTP and Protocol Buffers APIs, respectively.
The complete documentation for the HTTP interface is here. Here are the basics using CURL:
Usage documentation for this is still in the works, but here’s the relevant message (as seen in riak_pb):
We’re working on implementing these in all of Basho supported client libraries. Keep an eye on these for details and timelines around availability. We currently support counters in the following libraries across the following protocols:
- Python – HTTP and PB
- Java – HTTP and PB
- Erlang – PB
In addition to the docs and code, Basho Hacker Sam Elliot has started a Riak CRDT cookbook. The first section walks you through using counters in a few different ways, and even shows you how to simulate failure events. Take it for a spin and send Sam some feedback.
Future Data Types
In addition to counters, we have big plans for more data types in Riak. Sets and maps are on the short list, and the goal is to have these ready for Riak 2.0. Russell posted an extensive RFC on the Riak GitHub repo for those interested. Comments, critiques, and contributions are all encouraged.
Related Work and Additional Reading
- A simple way to store a PN-Counter in a riak_object
- Release Notes on Counters
- CRDT paper from Shapiro et al. at INRIA
Enjoy and see you at RICON West in October.