February 23, 2015
Over the last week, for a variety of reasons, the topic of security in the NoSQL space has become a prominent news item. Chief among these reasons was the announcement of a popular NoSQL database having multiple instances exposed to the public internet. From the headlines you might think that NoSQL solutions have inherent security problems. In fact, in some cases, the discussion is positioned intentionally as a relational vs. NoSQL issue. The reality is that NoSQL is not more or less secure than a traditional RDBMS.
The Security of any component of the technology stack is both the responsibility of the vendor providing the technology and those that are deploying it. How many routers are running with the default administrative password still set? Similarly, exposing any database, regardless of type, to the public internet without taking appropriate security precautions, including user authentication and authorization, is a “bad idea.” A base level of network security is an absolute requirement when deploying any data persistence utility. For Riak this can include:
- Appropriate physical security (including policies about root access)
- Securing the epmd listener port, handoff_port listener port, and the range ports specified in the riak.conf
- Defining users and optionally, groups (using Riak Security in Riak 2.0)
- Defining an authentication source for each user
- Granting necessary permissions to each user (and/or group)
- Checking Erlang MapReduce code for invocations of Riak modules other than riak_kv_mapreduce
- Ensuring your client software passes authentication information with each request, supports HTTPS or encrypted Protocol Buffers traffic
If you enable Riak security without having an established functioning SSL connection, all request to Riak will fail because Riak security (when enabled) requires a secure SSL connection. You will need to generate SSL certificates, enable SSL, and establish a certification configuration on each node.
The security discussion does not, however, end at the network. In fact, for those who are familiar with the Open Systems Interconnection model (OSI), a 7 layer conceptual model that characterizes and standardizes the internal functions of a communication system by partitioning it into abstraction layers, (ISO 7498-1) there is a corresponding security architecture reference (ISO 7498-2)…and that is just for the network. It is necessary to take adopt a comprehensive approach to security at every layer of the application stack…including the database.
The process of securing a database, which is only a component of the application stack, requires striking a fine balance. Basho has worked with large enterprise customers to ensure that Riak’s security architecture meets the needs of their application deployments and balances the effort required with the security, or compliance, requirements demanded by some of the worlds largest deployments.
NoSQL vs. Relational Security
As enterprises continue to adopt NoSQL more broadly, the question of security will continue to be raised. The reality is simple, it is necessary to evaluate the security of the database you are exploring in the same way that you would evaluate its scalability or availability characteristics. There is nothing inherent to the NoSQL market that makes it less, or more, secure that relational databases. It is true that some relational database, by aegis of their age and maturation, have more expansive security tooling available. However, when adopting a holistic, risk-based approach to security NoSQL solutions — like Riak — are as secure as required.
Security and Compliance
A compliance checklist (be it HIPAA or PCI) details, in varying specificity, the security requirements to achieve compliance. This checklist is subsequently verified through an audit by an independent entity…as well as ongoing internal audits.
So can I use NoSQL in compliant environments?
Without question, Yes. The difficulty of achieving compliance will depend on how the database is configured, what controls it provides for authentication and authorization, and many other elements of your application stack (including physical security of the datacenter, etc). Basho customers have deployed Riak in highly regulated environments and achieved their compliance requirements.
I would encourage you, however, to realize that compliance is an event. The process of securing your application, database, datacenter, etc. is an ongoing exercise. Many, particularly those in the payments industry, refer to this as a “risk-based” approach to security vs. a “compliance-based” approach.
Security and Riak
In nearly all commercial deployments of Riak, Riak is deployed on a trusted network and unauthorized access is restricted by firewall routing rules. This is expected, this is necessary and is sufficient for many use cases (when included as part of a holistic security posture including locking down ports, reasonable policies regarding root access, etc.). Some applications need an additional layer of security to meet business or regulatory compliance requirements.
To that end, in Riak 2.0, the security store changed substantially. While you should — without question — apply network layer security on top of Riak and the systems that Riak runs upon, there are now security features built into Riak that protect Riak itself, not just its network. This includes authentication (the process of identifying a user) and authorization (verifying whether the authenticated user has access to perform the requested operation). Riak’s new security features were explicitly modeled after user- and role-based systems like PostgreSQL. This means that the basic architecture of Riak Security should be familiar to most.
In Riak, administrators can selectively control access to a wide variety of Riak functionality. Riak Security allows you to both authorize users to perform specific tasks (from standard read/write/delete operations to search queries to managing bucket types and more) and to authenticate users and clients using a variety of security mechanisms. In other words, Riak operators can now verify who a connecting client is and determine what that client is allowed to do (if anything). In addition, Riak Security in 2.0 provides four options for security sources:
- trust — Any user accessing Riak from a specified IP may perform the permitted operations
- password — Authenticate with username and password (works essentially like basic auth)
- pam — Authenticate using a pluggable authentication module (PAM)
- certificate – Authenticate using client-side certificates
More detail on the Riak 2.0 Security capabilities are presented in the Security section of the documentation, in particular the section entitled Authentication and Authorization.
With a NoSQL system that provides authentication and authorization, and a properly secured network, you have progressed a long way in reducing the risk profile of your system. The application layer, of course, must still be considered.
Relational databases are still a part of the technology stack for many companies; others are innovating and incorporating NoSQL solutions either as a replacement for or alongside existing relational databases. As a result they have simplified their deployments, enhanced their availability, and reduced their costs.
Join us for this webinar where we will look at the differences between relational databases and NoSQL databases like Riak. We will look at why companies choose Riak over a relational database. We will analyze the decision points you should consider when choosing between relational and NoSQL databases and we will look at specific use cases, review data modeling and query options.
This Webinar is being held in two time slots:
- Wednesday, March 4, 2015 8:00-9:00 AM PST (4:00-5:00 PM GMT)
- Wednesday, March 4, 2015 12:00-1:00 PM PST (3:00-4:00 PM EST)
November 14, 2013
This series of blog posts will discuss how Riak differs from traditional relational databases. For more information about any of the points discussed, download our technical overview, “From Relational to Riak.” The previous post in the series was Relational to Riak – High Availability.
Riak is designed for scalability, which truly separates it from relational systems. As described in the previous post, relational databases run best on a single server. If the dataset grows beyond the capacity of this single machine, it can become prohibitively expensive (or even impossible) to simply upgrade to a bigger machine. In such a scenario, the only option may be to add more machines and divide the dataset across them using a technique called sharding.
Sharding divides data into logical parts (such as alphabetical, by customer, or by geographic region) that can be distributed across multiple machines – often manually. If data continues to grow, this process may need to be repeated at great expense.
Sharding is not only difficult, it also will typically lead to hot spots – meaning certain machines are responsible for storing and serving a disproportionately high amount of both data and requests. Hot spots can cause unpredictable latency and degraded performance.
(And remember all the ways in which availability is a challenge? Combine sharding with a master/slave architecture for maximal expense and general unpleasantness.)
Instead of sharding, Riak evenly distributes data across a cluster using consistent hashing. In a Riak cluster, the data space is divided into partitions which are claimed by the servers. When new data is written to the database, these objects are evenly placed around the ring and replicated 3 times (by default). This ensures that your data will always be available, even when nodes fail.
When nodes are added or removed, data is rebalanced automatically. New machines assume ownership of some of the partitions and existing machines hand off relevant partitions and associated data until data ownership is equal amongst nodes.
By eliminating the manual requirements of sharding and making hot spots highly unlikely, Riak makes it significantly easier for companies to scale, whether it’s just for a few months to handle peak loads or to support long-term growth strategies.
November 13, 2013
This series of blog posts will discuss how Riak differs from traditional relational databases. For more information about any of the points discussed, download our technical overview, “From Relational to Riak.”
One of the biggest differences between Riak and relational systems is our focus on availability. Riak is designed to be deployed to, and runs best on, multiple servers. It can continue to function normally in the presence of hardware and network failures. Relational databases, conversely, are simplest to set up on a single server.
Most relational databases offer a master/slave architecture for availability, in which only the master server is available for data updates. If the master fails, the slave is (hopefully) able to step in and take over.
However, even with this simple model, coping with failure (or even properly defining it) is non-trivial. What happens if the master and slave server cannot talk to each other? How do you recover from a split brain scenario, where both servers think they’re the master and accept updates? What happens if the slave is slow to respond to updates sent from the master database? Can clients read from a slave? If so, does the master need to verify that the slave has received all updates before it commits them locally and responds to the client that requested the updates?
Conversely, Riak is explicitly designed to expect server and network failure. Riak is a masterless system, meaning any server can respond to read or write requests. If one fails, others will continue to service client requests. Once this server becomes available again, the cluster will feed it any updates that it missed through a process we call hinted handoff.
Because Riak’s system allows for reads and writes when multiple servers are offline or otherwise unreachable, data may not always be consistent across the environment (usually only for a few milliseconds). However, through self-healing mechanisms like read repair and Active Anti-Entropy, all updates will propagate to all servers making data eventually consistent.
For many use cases, high availability is more important than strict consistency. Data unavailability can negatively impact revenue, damage user trust, lead to poor user experience, and cause lost critical data. Industries like gaming, mobile, retail, and advertising require always-on availability. Visit our Users Page to see how companies in various industries use Riak.
August 20, 2013
NoSQL is a misleading name. SQL was never the problem. However, this poorly named industry term does represent a response to changing business priorities and new challenges that require different kinds of database architectures.
Traditional database architectures were first developed in the late 60s and early 70s. They were the default option for many pre-Internet use cases and remain useful today for certain use cases requiring a relational data model. However, their limits are painfully apparent to many companies. Despite what traditional database vendors might have us believe, very little data generated today actually requires a SQL architecture. Businesses face many new challenges today that traditional databases simply are not designed to handle reliably or efficiently. These include:
- Global Users. It is no longer enough to provide a fast experience in one country. Users from all over the globe expect a low-latency experience, making geo-data locality more important than ever.
- Zero Downtime. Planned and unplanned. Both are bad for business. There is now an expectation for always-on availability. Operations teams emphasize must resiliency over recovery.
- Scale Matters. Businesses need to scale up quickly to meet peak loads during the holidays or product launches, and then they need to scale back down. They need an architecture that makes scaling the least of their worries.
- Flexible Data. From user generated data to machine-to-machine (M2M) activity, unstructured data is now commonplace. Businesses need flexibility to handle all the data generated and flowing.
- Omnichannel. Whether users are on a tablet, laptop, or smartphone, they require a device agnostic experience and low-latency.
- Amazon Economics. Every business wants Amazon Economics. With the nature of data growth today, businesses can’t afford expensive machines at every juncture. They need commodity machines to scale horizontally, not vertically.
Attempts to address these challenges with traditional databases result in an inflexible architecture with super high costs. “NoSQL” databases represent a fresh approach towards building flexible, resilient architectures. “NoSQL” goes where no database has ever gone before — into the wild space of the Internet and the massive scale requirements it represents.
Which brings us to NoSQL Now! Basho is sponsoring because the movement is more important than any single industry term. Andy Gross will also be on-hand to further discuss the larger trend of distributed systems:
Dealing with Systems in a New Distributed World
Chief Architect and Co-creator of Riak
Thursday, August 22, 2013
Please join us in San Jose for a look at the future of database technology.