February 23, 2015
Over the last week, for a variety of reasons, the topic of security in the NoSQL space has become a prominent news item. Chief among these reasons was the announcement of a popular NoSQL database having multiple instances exposed to the public internet. From the headlines you might think that NoSQL solutions have inherent security problems. In fact, in some cases, the discussion is positioned intentionally as a relational vs. NoSQL issue. The reality is that NoSQL is not more or less secure than a traditional RDBMS.
The Security of any component of the technology stack is both the responsibility of the vendor providing the technology and those that are deploying it. How many routers are running with the default administrative password still set? Similarly, exposing any database, regardless of type, to the public internet without taking appropriate security precautions, including user authentication and authorization, is a “bad idea.” A base level of network security is an absolute requirement when deploying any data persistence utility. For Riak this can include:
- Appropriate physical security (including policies about root access)
- Securing the epmd listener port, handoff_port listener port, and the range ports specified in the riak.conf
- Defining users and optionally, groups (using Riak Security in Riak 2.0)
- Defining an authentication source for each user
- Granting necessary permissions to each user (and/or group)
- Checking Erlang MapReduce code for invocations of Riak modules other than riak_kv_mapreduce
- Ensuring your client software passes authentication information with each request, supports HTTPS or encrypted Protocol Buffers traffic
If you enable Riak security without having an established functioning SSL connection, all request to Riak will fail because Riak security (when enabled) requires a secure SSL connection. You will need to generate SSL certificates, enable SSL, and establish a certification configuration on each node.
The security discussion does not, however, end at the network. In fact, for those who are familiar with the Open Systems Interconnection model (OSI), a 7 layer conceptual model that characterizes and standardizes the internal functions of a communication system by partitioning it into abstraction layers, (ISO 7498-1) there is a corresponding security architecture reference (ISO 7498-2)…and that is just for the network. It is necessary to take adopt a comprehensive approach to security at every layer of the application stack…including the database.
The process of securing a database, which is only a component of the application stack, requires striking a fine balance. Basho has worked with large enterprise customers to ensure that Riak’s security architecture meets the needs of their application deployments and balances the effort required with the security, or compliance, requirements demanded by some of the worlds largest deployments.
NoSQL vs. Relational Security
As enterprises continue to adopt NoSQL more broadly, the question of security will continue to be raised. The reality is simple, it is necessary to evaluate the security of the database you are exploring in the same way that you would evaluate its scalability or availability characteristics. There is nothing inherent to the NoSQL market that makes it less, or more, secure that relational databases. It is true that some relational database, by aegis of their age and maturation, have more expansive security tooling available. However, when adopting a holistic, risk-based approach to security NoSQL solutions — like Riak — are as secure as required.
Security and Compliance
A compliance checklist (be it HIPAA or PCI) details, in varying specificity, the security requirements to achieve compliance. This checklist is subsequently verified through an audit by an independent entity…as well as ongoing internal audits.
So can I use NoSQL in compliant environments?
Without question, Yes. The difficulty of achieving compliance will depend on how the database is configured, what controls it provides for authentication and authorization, and many other elements of your application stack (including physical security of the datacenter, etc). Basho customers have deployed Riak in highly regulated environments and achieved their compliance requirements.
I would encourage you, however, to realize that compliance is an event. The process of securing your application, database, datacenter, etc. is an ongoing exercise. Many, particularly those in the payments industry, refer to this as a “risk-based” approach to security vs. a “compliance-based” approach.
Security and Riak
In nearly all commercial deployments of Riak, Riak is deployed on a trusted network and unauthorized access is restricted by firewall routing rules. This is expected, this is necessary and is sufficient for many use cases (when included as part of a holistic security posture including locking down ports, reasonable policies regarding root access, etc.). Some applications need an additional layer of security to meet business or regulatory compliance requirements.
To that end, in Riak 2.0, the security store changed substantially. While you should — without question — apply network layer security on top of Riak and the systems that Riak runs upon, there are now security features built into Riak that protect Riak itself, not just its network. This includes authentication (the process of identifying a user) and authorization (verifying whether the authenticated user has access to perform the requested operation). Riak’s new security features were explicitly modeled after user- and role-based systems like PostgreSQL. This means that the basic architecture of Riak Security should be familiar to most.
In Riak, administrators can selectively control access to a wide variety of Riak functionality. Riak Security allows you to both authorize users to perform specific tasks (from standard read/write/delete operations to search queries to managing bucket types and more) and to authenticate users and clients using a variety of security mechanisms. In other words, Riak operators can now verify who a connecting client is and determine what that client is allowed to do (if anything). In addition, Riak Security in 2.0 provides four options for security sources:
- trust — Any user accessing Riak from a specified IP may perform the permitted operations
- password — Authenticate with username and password (works essentially like basic auth)
- pam — Authenticate using a pluggable authentication module (PAM)
- certificate – Authenticate using client-side certificates
More detail on the Riak 2.0 Security capabilities are presented in the Security section of the documentation, in particular the section entitled Authentication and Authorization.
With a NoSQL system that provides authentication and authorization, and a properly secured network, you have progressed a long way in reducing the risk profile of your system. The application layer, of course, must still be considered.
Relational databases are still a part of the technology stack for many companies; others are innovating and incorporating NoSQL solutions either as a replacement for or alongside existing relational databases. As a result they have simplified their deployments, enhanced their availability, and reduced their costs.
Join us for this webinar where we will look at the differences between relational databases and NoSQL databases like Riak. We will look at why companies choose Riak over a relational database. We will analyze the decision points you should consider when choosing between relational and NoSQL databases and we will look at specific use cases, review data modeling and query options.
This Webinar is being held in two time slots:
- Wednesday, March 4, 2015 8:00-9:00 AM PST (4:00-5:00 PM GMT)
- Wednesday, March 4, 2015 12:00-1:00 PM PST (3:00-4:00 PM EST)
April 23, 2014
Traditional database architectures were the default option for many pre-Internet use cases and architectures, such as MySQL, remain common today. However, these traditional solutions have limits that quickly become apparent as companies (and data) grow. Modern companies have changing priorities: downtime (planned or unplanned) is never acceptable; customers require a fast and unified experience; and data of all types is growing at unimaginable rates. Solutions such as Riak are designed to handle these shifting priorities.
Top Reasons to Move to Riak
- Zero Downtime: Distributed NoSQL solutions like Riak are designed for always-on availability. This means data is always read/write accessible and the system never goes down. Downtime, planned or unplanned, can make or break a customer experience.
- Ease-of-Scale: Traffic can be unpredictable. Businesses need to scale up quickly to handle peak loads during holidays or major releases, but then need to scale back down to save money. Riak makes it easy to add and remove any number of nodes as needed and automatically redistributes data across the cluster. Scaling up or down never needs to be a burden again.
- Flexible Data Model: From user generated data to machine-to-machine (M2M) activity, unstructured data is now commonplace. Riak can store any type of data easily with its simple key/value architecture.
- Global Data Locality: Every company is a global company and needs to provide consistent, low-latency experiences to everyone, regardless of physical location. Riak’s multi-datacenter replication makes it easy to set up datacenters wherever users are, for both geo-data locality and maintaining active backups.
Users That Switched to Riak
Many top companies have already moved from relational architectures to Riak. Here’s a look at a few that have made the switch.
Bump (acquired by Google)
Bump, acquired by Google in 2013, allows users to share contact information and photos by bumping two phones together. Bump uses Riak to store almost all of its user data: contacts, communications sent and received, handset information, social network OAuth tokens, etc. Bump moved from MySQL to Riak due to its operational qualities: “No longer will we have to do any master/slave song and dance, nor will we fret about performance, capacity, or scalability; if we need more, we’ll just add nodes to the cluster.” Learn more about their move in their case study.
Alert Logic helps companies defend against security threats and address compliance mandates, such as PCI and HIPAA. Alert Logic switched from MySQl to Riak to collect and process machine data and to perform real-time analytics, detect anomalies, ensure compliance, and proactively respond to threats. Alert Logic processes nearly 5TB/day in Riak and has achieved performance results of up to 35k operations/second. Learn more about how Alert Logic improved performance through Riak in our blog post.
The Weather Company
The Weather Company provides millions of people every day with the world’s best weather forecasts, content and data, connecting with them through television, online, mobile and tablet screens. Riak is central to The Weather Company’s weather data services platform that delivers real-time weather services to aerospace, insurance, energy, retail, media, government, and hospitality industries. Check out our blog to see why The Weather Company selected Riak over MySQL to support their massive big data needs.
Dell uses Riak as the core distributed database technology underlying its customer cloud management solutions. Riak is used to collect and manage data associated with customer application provisioning and scaling, application configuration management, usage governance, and cloud utilization monitoring. In 2012, Enstratius (acquired by Dell) switched to Riak from MySQL in order to provide cross-datacenter redundancy, high write availability, and fault tolerance. Check out the full Enstratius case study.
Data Modeling in Riak
Riak has a “schemaless” design. Objects are comprised of key/value pairs, which are stored in flat namespaces called buckets. Below is a chart with some simple approaches to building common application types with a key/value model.
|Session||User/Session ID||Session Data|
|Advertising||Campaign ID||Ad Content|
|Sensor||Date, Date/Time||Sensor Updates|
|User Data||Login, eMail, UUID||User Attributes|
|Content||Title, Integer||Text, JSON/XML/HTML Document, Images, etc.|
January 22, 2013
Best Buy is North America’s top specialty retailer of consumer electronics, personal computers, entertainment software, and appliances. Riak has been an integral part in the transformation push to re-platform Best Buy’s eCommerce platform. Riak’s architecture has helped Best Buy build and operate its new platform with Riak playing a key role.
At our developer conference, Ricon, we were lucky to have Joel Crabb, Director of Web Architecture at BestBuy.com, talk about how Riak fits into their pattern of innovation and moving from a traditional relational architecture.
To learn more about how retailers can use Riak for their eCommerce needs, check out the whitepaper, “Retail on Riak: A Technical Introduction.” For more information on moving from a relational database to Riak, sign up for our webcast this Thursday, covering advantages, tradeoffs and development considerations.
January 3, 2013
Most teams considering using Riak come from a relational database background. From our webcast on moving from relational to Riak, the below slide deck covers an overview of Riak, how the architecture differs from a relational approach, the advantages for scaling and development, and what’s different about application building and database operating in a non-relational world. We also include a few stories of Riak users who replaced MySQL or added Riak to the mix.
Interested in learning more? Check out our overview, From Relational to Riak.
December 19, 2012
We work with many people building platforms and applications on Riak that have traditionally lived on relational systems. The switch to Riak can be driven by the needs of greenfield applications and growing data volumes, business requirements around scale and availability, or the desire for operational ease and multi-site replication.
Some of the most common questions we get are from teams with a primary background in MySQL, Oracle Database or other relational systems wondering about the advantages and tradeoffs of moving to Riak. To address these common questions, we’ve written up an introductory guide on moving from relational to Riak. In it, we cover:
- Scalability benefits of Riak, including an examination of limitations around master/slave architectures and sharding, and what Riak does differently
- A look at the operational aspects of Riak and where they differ from relational approaches
- Riak’s data model and benefits for developers, as well as the tradeoffs and limitations of a key/value approach
- Migration considerations, including where to start when migrating existing applications to Riak
- Riak’s eventually consistent design, how it differs from a strongly consistent design, and things you need to know about handling data conflicts in Riak
- Multi-site replication options in Riak
November 13, 2012
Legacy RDBMS systems offered mature monitoring capabilities that usually gave operators a clear view of how their databases were (or weren’t) performing. Emerging distributed systems introduce new levels of complexity, presenting new problems in monitoring and diagnosis. In this blog we highlight two talks given at last month’s RICON which shed light on this problem and offer some interesting solutions.
Next Generation Monitoring of Large Scale Riak Applications
In this talk, Theo Schlossnage, founder of OmniTI, talks about moving beyond standard monitoring metrics (average, mean, 95th percentile, 99th percentile, etc.), and advocates for more sophisticated methods, namely histograms and new visualization techniques. He illustrates this with some interesting real world examples in which metrics such as average response time have little meaning in the face of real world distributions which are often multi-modal and rapidly evolving.
Modern Radiology for Distributed Systems
In this talk, Boundary engineer Dietrich Featherston uses radiological imaging as a metaphor to explore the challenges of monitoring distributed systems –Boundary uses Riak to store high-resolution network data for its analysis engine. In this metaphor, if we just look at metrics pulled from individual hosts (CPU usage, memory usage, etc.), we can see diseased “cells”, but ignore the whole organism. We react to problems, instead of preventing them. To illustrate, Dietrich walks through a series of case studies highlighting new, “context aware”, non-invasive monitoring techniques.
For more RICON videos on a range of distributed systems topics, visit our RICON aftermath site.
**January 02, 2013**
New to Riak? Thinking about using Riak instead of a relational database? Join Basho chief architect Andy Gross and director of product management Shanley Kane for an intro this Thursday (11am PT/2pm ET). In about 30 minutes, we’ll cover the basics of:
* Scalability benefits of Riak, including an examination of limitations around master/slave architectures and sharding, and what Riak does differently
* A look at the operational aspects of Riak and where they differ from relational approaches
* Riak’s data model and benefits for developers, as well as the tradeoffs and limitations of a key/value approach
* Migration considerations, including where to start when migrating existing apps
* Riak’s eventually consistent design
* Multi-site replication options in Riak
Register for the webcast [here](http://info.basho.com/RelationalToRiakJan3.html).
A Conversation about Variety and Making Choices
Benjamin Black stopped by the Basho offices recently and we had the chance to sit him down and discuss the collection of things that is “NoSQL.”
In this, the fifth installment of the Basho Riak Podcast, Benjamin and Basho’s CTO Justin Sheehy discuss the factors that they think should play the largest part in your evaluation of any database, NoSQL or otherwise. (Hint: it’s not name recognition.)
Highlights include why and when you might be best served improving your relational database architecture and when it might be better to use a NoSQL system like Cassandra or Riak to solve part of your problem, as well as why you probably don’t want to figure out which one of the NoSQL systems solves all of your problems.
A Muddle That Slows Adoption
Basho released Riak as an open source project seven months ago and began commercial service shortly thereafter. As new entrants into the loose collection of database projects we observed two things:
- Widespread Confusion — the NoSQL rubric, and the decisions of most projects to self-identify under it, has created a false perception of overlap and similarity between projects differing not just technically but in approaches to licensing and distribution, leading to…
- Needless Competition — driving the confusion, many projects (us included, for sure) competed passionately (even acrimoniously) for attention as putative leaders of NoSQL, a fool’s errand as it turns out. One might as well claim leadership of all tools called wrenches.
The optimal use cases, architectures, and methods of software development differ so starkly even among superficially similar projects that to compete is to demonstrate a (likely pathological) lack of both user needs and self-knowledge.
This confusion and wasted energy — in other words, the market inefficiency — has been the fault of anyone who has laid claim to, or professed designs on, the NoSQL crown.
- Adoption suffers — Users either make decisions based on muddled information or, worse, do not make any decision whatsoever.
- Energy is wasted — At Basho we spent too much time from September to December answering the question posed without fail by investors and prospective users and clients: “Why will you ‘win’ NoSQL?”
With the vigor of fools, we answered this question, even though we rarely if ever encountered another project’s software in a “head-to-head” competition. (In fact, in the few cases where we have been pitted head-to-head against another project, we have won or lost so quickly that we cannot help but conclude the evaluation could have been avoided altogether.)
The investors and users merely behaved as rational (though often bemused) market actors. Having accepted the general claim that NoSQL was a monolithic category, both sought to make a bet.
Clearly what is needed is objective information presented in an environment of cooperation driven by mutual self-interest.
This information, shaped not by any one person’s necessarily imperfect understanding of the growing collection of data storage projects but rather by all the participants themselves, would go a long way to remedying the inefficiencies discussed above.
Demystification through data, not marketing claims
We have spoken to representatives of many of the emerging database projects. They have enthusiastically agreed to participate in a project to disclose data about each project. Disclosure will start with the following: a common benchmark framework and benchmarks/load tests modeling common use cases.
- A Common Benchmark Framework — For this collaboration to succeed, no single aspect will impact success or failure more than arriving at a common benchmark framework.
At Basho we have observed the proliferation of “microbenchmarks,” or benchmarks that do not reflect the conditions of a production environment. Benchmarks that use a small data set, that do not store to disk, that run for short (less than 12 hours) durations, do more to confuse the issue for end users than any single other factor. Participants will agree on benchmark methods, tools, applicability to use cases, and to make all benchmarks reproducible.
Compounding the confusion is when benchmarks are used for comparison of different use cases or was run on different hardware and yet compared head-to-head as if the tests or systems were identical. We will seek to help participants run equivalent on the various databases and we will not publish benchmark results that do not profile the hardware and configuration of the systems.
- Benchmarks That Support Use Cases — participants agree to benchmark their software under the conditions and with load tests reflective of use cases they commonly see in their user base or for which they think their software is best suited.
- Dissemination to third-parties — providing easy-to-find data to any party interested in posting results.
- Honestly exposing disagreement — Where agreement cannot be reached on any of the elements of the common benchmarking efforts, participants will respectfully expose the rationales for their divergent positions, thus still providing users with information upon which to base decisions.
There is more work to be done but all participants should begin to see the benefits: faster, better decisions by users.
We invite others to join, once we are underway. We, and our counterparts at other projects, believe this approach will go a long way to removing the inefficiencies hindering adoption of our software.