December 18, 2014
One of the interesting things about attending industry events, like AWS re:Invent, is identifying common trends that arise in conversations. Recent conversations point to a renewed interest in “enterprise ready replication” for NoSQL databases.
As business data continues to grow, there is an entirely new set of challenges that are presented related to availability, scalability, and fault-tolerance. While most NoSQL databases work at small scale, availability is often compromised as applications reach full production or peak capacity. Having the right replication functionality is key to ensuring that availability requirements are not compromised as your system grows.
“Replication” may mean different things based on context. In this case, we are referring to the movement of data in a database cluster — or across datacenters — for the purpose of redundancy or data locality. If your database experience began in an RDBMS context, then replication implies a specific contextual understanding of multi-master transactional deployment and, perhaps, shipping transaction logs between incremental backups in a hot/warm database configuration. In contrast, for those who began in the NoSQL era, the term may evoke images of replica-sets on a sharded infrastructure and the operational overhead associated therewith.
In a distributed NoSQL database, like Riak, the term replication is used to encompass two distinct concepts. First, intra-cluster replication for high availability and fault tolerance within the datacenter; and second, multi-datacenter replication for data locality and global availability. There is none of the complexity of log shipping or dealing with a sharded infrastructure.
Data replication is a core feature of Riak’s basic architecture. Riak was designed to operate as a clustered system containing multiple nodes (commodity servers or cloud instances). The replication implementation allows data to live on multiple machines at once, with a single write request, in case a node in the cluster goes down or is unavailable due to issues like network partitioning.
Intra-cluster replication is fundamental and automatic in Riak, so that your data is always available. All data stored in Riak is replicated to a number of nodes in the cluster according to a configurable parameter (
n_val) set in a buckets bucket type.
With the default
n_val setting of 3, there are always three copies of all data. These copies will be on three different partitions/vnodes. A detailed explanation and analysis of this replication capability is discussed in the Riak documentation – Understanding replication by example.
In the case of intra-cluster replication, or what we would refer to simply as “replication”, data distribution ensures redundant data such that high availability is maintained in a failure state.
In contrast to intra-cluster replication, multi-datacenter replication (a feature of Riak Enterprise) is a critical part of modern application infrastructures. Riak Enterprise offers multi-datacenter replication features so that data stored in Riak can be replicated to multiple sites (vs. multiple servers in the same site).
As we are all aware, understanding application latency (for an end user) begins with the realization data can’t travel faster than the speed of light. So, inherently, as source information moves further from it’s consumption latency is introduced. As such, there is a set amount of latency for a customer connecting to your application hosted in New York when they are accessing the application from San Francisco. This latency profile increases, and becomes more complex, as the geographic distribution of your customer base increases.
With multi-datacenter replication in Riak Enterprise, data can be replicated across locations and geographic areas providing for disaster recovery, data locality, compliance with regulatory requirements, the ability to “burst” peak loads into public cloud infrastructure, amongst others.
Riak’s multi-datacenter replication is masterless. One cluster acts as a primary, or source, cluster. The primary cluster handles replication requests from one or more secondary, or sink, clusters (generally located in datacenters in other regions or countries). If the datacenter with the primary cluster goes down, a secondary cluster can automatically take over as the primary cluster.
More architectural strategies for multi-datacenter implementations, are covered in the Basho whitepaper entitled Riak Enterprise: Multi-Datacenter Replication – A Technical Overview & Use Cases or in the Basho Documentation section Multi-Datacenter Replication: v3 Architecture.
Replication, inside a cluster, is a core design tenant of Riak. This is used to provide the availability and fault-tolerance characteristics — with a low operational overhead — that many unstructured data workloads demand.
Multi-datacenter replication, while related, is an entirely different approach and architecture to enable the geographic distribution of data to solve for redundancy, geo-data locality, etc.
Replication is an important scalability feature of any database deployment. Ensuring that your NoSQL database replicates data in a way that is scalable, operationally simple and achieves your business objectives is key to your success.
January 27, 2014
On the official Basho docs, we compare Riak to multiple other databases. We are currently working on updating these comparisons, but, in the meantime, we wanted to provide a more up-to-date comparison for one of the more common questions we’re asked: How does Riak compare to Cassandra?
Cassandra looks the most like Riak out of any other widely-deployed data storage technology in existence. Cassandra and Riak have architectural roots in Amazon’s Dynamo, the system Amazon engineered to handle their highly available shopping cart service. Both Riak and Cassandra are masterless, highly available stores that persist replicas and handle failure scenarios through concepts such as hinted handoff and read-repair. However, there are certain key differences between the two that should be considered when evaluating them.
Amazon’s Dynamo utilized a Key/Value data model. Early in Cassandra’s development, a decision was made to diverge from keys and values toward a wide-row data model (similar to Google’s BigTable). This means Cassandra is a Key/Key/Value store, which includes the concept of Column Families that contain columns. With this model, Cassandra is able to handle high write volumes, typically by appending new data to a row. This also allows Cassandra to perform very efficient range queries, with the tradeoff being a more rigid data model since rows are “fixed” and non-sequential read operations often require several disk seeks.
On the other hand, Riak is a straight Key/Value store. We believe this offers the most flexibility for data storage. Riak’s schemaless design has zero restrictions on data type, so an object can be a JSON document at one moment and a JPEG at the next.
Like Cassandra, Riak also excels at high write volumes. Range queries can be a little more costly, though still achievable through Secondary Indexes. In addition, there are a number of data modeling tips and tricks for Riak that make it easy to expose access to data in ways that sometimes aren’t as obvious at first glance. Below are a few examples:
In Riak, multi-datacenter replication is achieved by connecting independent clusters, each of which own their own hash ring. Operators have the ability to manage each cluster and select all or part of the data to replicate across a WAN. Multi-datacenter replication in Riak features two primary modes of operation: full sync and real-time. Data transmitted between clusters can be encrypted via OpenSSL out-of-the-box. Riak also allows for per-bucket replication for more granular control.
Cassandra achieves replication across WANs by splitting the hash ring across two or more clusters, which requires operators to manually define a NetworkTopologyStrategy, Replication Factor, a Replication Placement Strategy, and a Consistency Level for both local and cross data center requests.
Conflict Resolution and Object Versioning
Cassandra uses wall clock timestamps to establish ordering. The resolution strategy in this case is Last Write Wins (LWW), which means that data may be overwritten when there is write contention. The odds of data loss are magnified by (inevitable) server clock drift. More details on this can be found in the blog, “Clocks are Hard, or, Welcome to the Wonderful World of Distributed Systems.”
Riak uses a data structure called vector clocks to track the causal ordering of updates. This per-object ancestry allows Riak to identify and isolate conflicts without using system clocks.
In the event of a concurrent update to a single key, or a network partition that leaves application servers writing to Riak on both sides of the split, Riak can be configured to keep all writes and expose them to the next reader of that key. In this case, choosing the right value happens at the application level, allowing developers to either apply business logic or some common function (e.g. merge union of values) to resolve the conflict. From there, that value can be written back to Riak for its key. This ensures that Riak never loses writes.
Riak Data Types, first introduced in Riak 1.4 and expanded in the upcoming Riak 2.0, are designed to converge automatically. This means Riak will transparently manage the conflict resolution logic for concurrent writes to objects.
In the event of server failures and network problems, Riak is designed to always accept read and write requests, even if the servers that are ordinarily responsible for that data are unavailable.
Cassandra will allow writes to (optionally) be stored on alternative servers, but will not allow that data to be retrieved. Only after the cluster is repaired and those writes are handed off to an appropriate replica server (with the potential data loss that timestamp-based conflict resolution implies, as discussed earlier) will the data that was written be available to readers.
Imagine a user working with a shopping cart when the application is unable to connect to the primary replicas. The user can re-add missing items to the cart but will never actually see the items show up in the cart (unless the application performs its own caching, which introduces more layers of complexity and points of failure).
When handling missing or divergent/stale data, Riak and Cassandra have many similarities. Both employ a passive mechanism where read operations trigger the repair of inconsistent replicas (known as read-repair). Both also use Active Anti-Entropy, which builds a modified Merkle tree to track changes or new inserts on a per hash-ring-partition basis. Since the hash rings contain overlapping keys, the trees are compared and any divergent or missing data is automatically repaired in the background. This can be incredibly effective at combating problems such as bitrot, since Active Anti-Entropy does not need to wait for a read operation.
The key difference in implementation is that Cassandra uses short-lived, in-memory hash trees that are built per Column Family and generated as snapshots during major data compactions. Riak’s trees are on-disk and persistent. Persistent trees are safer and more conducive to ensuring data integrity across much larger datasets (e.g. 1 billion keys could easily cost 8-16GB of RAM in Cassandra versus 8-16GB of disk in Riak).
Both Cassandra and Riak are eventually consistent, scalable databases that have strengths for specific use cases. Each has hundreds of thousands of hours of engineering invested and the commercial backing and support offered by their respective companies, Datastax and Basho. At Basho, we have labored to make Riak very robust and easy to operate at both large and small scale. For more information on how Riak is being used, visit our Riak Users page. For a look at what’s to come, download the Technical Preview of Riak 2.0.
The Weather Company VP of Enterprise Data to Unveil New IT Platform at Amazon AWS re:Invent 2013
LAS VEGAS, CA – November 13, 2013 – Amazon AWS re:Invent 2013 – During the past year, The Weather Company has undergone a remarkable transformation to its technology infrastructure as a data-driven media and technology company. As a part of the transformation, The Weather Company, which oversees popular brands such as The Weather Channel, weather.com, Weather Underground, Intellicast and WSI, has selected next generation technologies to underpin its new IT platform, including Basho’s Riak, Apache Hadoop, and Dasein.
“Weather is the original big data. However, weather changes, and so must IT,” commented Bryson Koehler, executive vice president and CIO, The Weather Company. “A massive data explosion is at the center of our growth strategy. The Weather Company requires an architecture that is both flexible and reliable, allowing us to deliver higher accuracy through superior data. Basho has been a valuable partner in our transformation and Riak has proven to be a critical component as the NoSQL distributed database powering our new platform.”
Since joining The Weather Company in July 2012 as EVP and CIO, Bryson Koehler has renewed the company’s vision for technology and executed a global IT transformation to propel their business growth strategy.
The Weather Company deployed Riak across multiple Amazon Web Services (AWS) availability zones using Riak Enterprise’s Multi-Datacenter Replication for ultra high-availability. Riak is used to store a variety of data from satellites, radars, forecast models, users, and weather stations worldwide. The next-generation data services platform enables The Weather Company to expand its ability to serve superior professional weather services in major weather-influenced markets, including media, hospitality, insurance, energy, and retail.
“The Weather Company typifies the customer that Riak was designed for,” said Greg Collins, CEO of Basho Technologies. “Aligning data services with customer consumption requires a flexible, available, and operationally simple database architecture. Riak Enterprise solves these needs while also working with the The Weather Company to ensure it is able to scale to meet the demand of an increasing set of data points and be well positioned for future initiatives.”
During Amazon re:Invent 2013, Sathish Gaddipati, vice president of enterprise data at The Weather Company will unveil their newest Weather Forecast and Data Services Platform and TWC’s overall enterprise IT transformation. Sathish’s presentation, titled How the Weather Company Monetizes Weather, the Original Big Data Problem, will be Thursday, November 14 at 5:30pm at The Venetian Las Vegas.
February 27, 2013
Multi-datacenter replication is a critical part of modern infrastructure, providing essential business benefits for enterprise applications, platforms and services. Riak Enterprise offers multi-datacenter replication so that data stored in Riak can be replicated to multiple sites. Over the next two posts, we will look at some common implementations, starting with configurations for backups and data locality. For more information on Riak Enterprise architecture and configuration, download the complete whitepaper.
Primary Cluster with Failover
One of the most common architectural patterns in multi-datacenter replication is maintaining a primary cluster that serves traffic and a backup cluster for emergency failover. This configuration can be an important component of compliance with regulatory requirements, ensuring business continuity and access to data even in serious failure modes.
In this configuration, a primary cluster serves as the production cluster from which all read and write operations are served. The backup cluster(s) is maintained in another datacenter. In the event of a datacenter outage or critical failure at the primary site, requests can be directed to the backup cluster either by changing DNS configuration or rules for routing via a load balancer.
For this use case, we recommend that your failover strategy be tested periodically so any potential issues can be resolved in advance of a crisis. It’s also beneficial to have your failover strategy fully defined upfront – know the conditions in which a failover mode will be invoked, decide how traffic will be directed to the backup, and document and test the failover strategy to ensure success.
Active-Active Cluster Configuration
To achieve data locality, when clients are served at low-latency by whatever datacenter is nearest to them, you can maintain two (or more) active, synced clusters that are both responsible for serving data to clients. An added benefit of this approach is that in the event of a datacenter failure where one of the clusters is hosted, all traffic can be served from the other, still-functional site for business continuity.
For data locality, requests can be load balanced across geographies, with geo-based client requests directed to the appropriate datacenter. For example, US-based requests can be served out of a US-based datacenter while EU-based requests can be served out of a regional site. For situations where not all data needs to be shared across all datacenters (or if certain data, such as user data, must only be stored in a specific geographic region to meet privacy regulations), Riak Enterprise’s multi-datacenter replication can be configured on a per-bucket basis so only shared assets, popular assets, etc. are replicated.
February 26, 2013
Mobile platforms and applications pose unique infrastructure challenges for today’s companies. These applications require low-latency, always-available small object storage that can scale to millions or more users, and support highly concurrent access and traffic spikes.
Riak provides a number of benefits for these platforms, including:
- Low-Latency Data Storage: Riak is designed to serve predictable, low-latency requests to provide a fast, available experience to all users.
- Straightforward Data Model: Riak uses a simple key-value data model, which is ideal for storing and serving mobile content, user information, events, and session data. Riak is content agnostic, so there are no restrictions on content type.
- Accommodates Peak Loads Gracefully: To handle increasing user data and accommodate peak loads during events, Riak makes it easy to add additional capacity and scale out quickly. Riak automatically rebalances data when new nodes are added, while its consistent hashing methodology prevents hot spots in the database.
- Multi-Datacenter Replication: Riak Enterprise’s multi-datacenter replication allows mobile platforms to serve low-latency content to users all over world by maintaining a global data footprint.
- For a full overview, download our new whitepaper on building mobile services with Riak
Bump is a popular mobile app that makes it easy for users to share their contact information, photos, and other objects by simply “bumping” their smartphones. They use Riak to store user data and currently run 25 nodes of Riak storing about 3TB of data.
For more details about how Bump uses Riak and how they designed their application, check out Bump’s presentation at RICON2012, Basho’s 2012 developer conference. You can also read the complete case study for more information about why Bump chose Riak.
Voxer is a popular Walkie Talkie application for smartphones that allows users to send instant voice messages to one or more friends. They switched to Riak due to its fault-tolerance and ability to scale quickly and easily. They currently run more than 50 machines on Riak to support their huge growth and user base. For more details about how Voxer uses Riak, check out the complete case study and watch Matt Ranney’s talk at a Riak Meetup in San Francisco.
To learn more about how mobile platforms can use Riak for their data needs, check out the complete overview, “Mobile on Riak: A Technical Introduction.”
February 25, 2013
This post takes an in-depth look at Riak Enterprise’s new multi-datacenter replication capabilities, available in the recent 1.3 release. Riak Enterprise’s multi-datacenter replication now ships with “advanced mode,” which features some performance and resiliency improvements that we’ve developed by working with production customers:
- Instead of having only one TCP connection over which data is streamed from one cluster to another, this new version features multiple concurrent TCP connections (approximately one per physical node) and processes are used between sites. This prevents against possible performance bottlenecks, which can be especially common when run on nodes constrained by per-instance bandwidth limits (such as in a cloud environment).
- Easier configuration of multi-datacenter replication. Simply use a shell command to name your clusters, then connect both clusters using an ip:port combination.
- Better per-connection statistics for both full-sync and real-time modes.
- New ability to tweak full-sync workers per node and per cluster, which allows customers to dial-in performance.
Details of the advanced mode capabilities are below. For more about the multi-datacenter replication upgrades and our 1.3 release, check out this recent article from GigaOM. For full technical details, check out our documentation on multi-datacenter replication. For an examination of common architectures and use cases for Riak Enterprise, including datacenter failover, active-active cluster configurations, availability zones, and cloud bursting, download our technical overview.
The new advanced mode of Riak Enterprise’s multi-datacenter replication takes the best features from the past single channel communications model and scales it up to one-to-one connections between peer nodes of clusters. With concurrent channels and the ability to constrain the maximum connections per node and per cluster, the new multi-datacenter replication allows the full capacity of the network and cluster size to scale the performance to available resources.
It begins with a much easier configuration command language and environment, with natural objects as sources, sinks, and cluster names. For example, real-time and full-sync enable/disable, start/stop, and status all use these human friendly symbols. All of the connections go through a single port, reducing network administration to a single firewall port forwarding. Riak then manages the different protocols on this port. Connections are persistent, resilient to outages, and adapt to changes in cluster names and IP addresses automatically.
Two Sync Modes
Real-time synchronization between clusters uses a new queueing mechanism that ensures maximum performance and graceful shutdown of nodes. This guarantees that there is no loss of replication data during upgrades or scheduled maintenance. It also automatically balances the load across all nodes of both the source and sink clusters and requires no configuration.
Full-synchronization between clusters uses a scheduling algorithm to maximize the transfer rate of data between peer nodes of the two clusters. Partitions are synchronized in parallel so that the maximum number of keys can be updated concurrently with the minimum overlap, which minimizes load and contention on both the source and sink clusters. Full-sync supports concurrent syncs between multiple clusters and optimizes the load dynamically, ensuring nodes never exceed their configured connectivity. This allows clusters to synchronize at maximum efficiency, without impacting their runtime performance for connected clients as they make PUT and GET requests.
We are also planning to include Secure Sockets Layer and Network Address Translation support in this advanced mode of multi-datacenter replication – it is currently only available in default mode. Additionally, future improvements will take advantage of the Active Anti-Entropy (that was introduced in Riak 1.3) to make full-sync differencing even faster. Stay tuned for more updates!
To learn more about Riak 1.3 and the new advanced mode for multi-datacenter replication, sign up for our webcast on Thursday, March 7th.