December 18, 2014
One of the interesting things about attending industry events, like AWS re:Invent, is identifying common trends that arise in conversations. Recent conversations point to a renewed interest in “enterprise ready replication” for NoSQL databases.
As business data continues to grow, there is an entirely new set of challenges that are presented related to availability, scalability, and fault-tolerance. While most NoSQL databases work at small scale, availability is often compromised as applications reach full production or peak capacity. Having the right replication functionality is key to ensuring that availability requirements are not compromised as your system grows.
“Replication” may mean different things based on context. In this case, we are referring to the movement of data in a database cluster — or across datacenters — for the purpose of redundancy or data locality. If your database experience began in an RDBMS context, then replication implies a specific contextual understanding of multi-master transactional deployment and, perhaps, shipping transaction logs between incremental backups in a hot/warm database configuration. In contrast, for those who began in the NoSQL era, the term may evoke images of replica-sets on a sharded infrastructure and the operational overhead associated therewith.
In a distributed NoSQL database, like Riak, the term replication is used to encompass two distinct concepts. First, intra-cluster replication for high availability and fault tolerance within the datacenter; and second, multi-datacenter replication for data locality and global availability. There is none of the complexity of log shipping or dealing with a sharded infrastructure.
Data replication is a core feature of Riak’s basic architecture. Riak was designed to operate as a clustered system containing multiple nodes (commodity servers or cloud instances). The replication implementation allows data to live on multiple machines at once, with a single write request, in case a node in the cluster goes down or is unavailable due to issues like network partitioning.
Intra-cluster replication is fundamental and automatic in Riak, so that your data is always available. All data stored in Riak is replicated to a number of nodes in the cluster according to a configurable parameter (
n_val) set in a buckets bucket type.
With the default
n_val setting of 3, there are always three copies of all data. These copies will be on three different partitions/vnodes. A detailed explanation and analysis of this replication capability is discussed in the Riak documentation – Understanding replication by example.
In the case of intra-cluster replication, or what we would refer to simply as “replication”, data distribution ensures redundant data such that high availability is maintained in a failure state.
In contrast to intra-cluster replication, multi-datacenter replication (a feature of Riak Enterprise) is a critical part of modern application infrastructures. Riak Enterprise offers multi-datacenter replication features so that data stored in Riak can be replicated to multiple sites (vs. multiple servers in the same site).
As we are all aware, understanding application latency (for an end user) begins with the realization data can’t travel faster than the speed of light. So, inherently, as source information moves further from it’s consumption latency is introduced. As such, there is a set amount of latency for a customer connecting to your application hosted in New York when they are accessing the application from San Francisco. This latency profile increases, and becomes more complex, as the geographic distribution of your customer base increases.
With multi-datacenter replication in Riak Enterprise, data can be replicated across locations and geographic areas providing for disaster recovery, data locality, compliance with regulatory requirements, the ability to “burst” peak loads into public cloud infrastructure, amongst others.
Riak’s multi-datacenter replication is masterless. One cluster acts as a primary, or source, cluster. The primary cluster handles replication requests from one or more secondary, or sink, clusters (generally located in datacenters in other regions or countries). If the datacenter with the primary cluster goes down, a secondary cluster can automatically take over as the primary cluster.
More architectural strategies for multi-datacenter implementations, are covered in the Basho whitepaper entitled Riak Enterprise: Multi-Datacenter Replication – A Technical Overview & Use Cases or in the Basho Documentation section Multi-Datacenter Replication: v3 Architecture.
Replication, inside a cluster, is a core design tenant of Riak. This is used to provide the availability and fault-tolerance characteristics — with a low operational overhead — that many unstructured data workloads demand.
Multi-datacenter replication, while related, is an entirely different approach and architecture to enable the geographic distribution of data to solve for redundancy, geo-data locality, etc.
Replication is an important scalability feature of any database deployment. Ensuring that your NoSQL database replicates data in a way that is scalable, operationally simple and achieves your business objectives is key to your success.
August 20, 2013
NoSQL is a misleading name. SQL was never the problem. However, this poorly named industry term does represent a response to changing business priorities and new challenges that require different kinds of database architectures.
Traditional database architectures were first developed in the late 60s and early 70s. They were the default option for many pre-Internet use cases and remain useful today for certain use cases requiring a relational data model. However, their limits are painfully apparent to many companies. Despite what traditional database vendors might have us believe, very little data generated today actually requires a SQL architecture. Businesses face many new challenges today that traditional databases simply are not designed to handle reliably or efficiently. These include:
- Global Users. It is no longer enough to provide a fast experience in one country. Users from all over the globe expect a low-latency experience, making geo-data locality more important than ever.
- Zero Downtime. Planned and unplanned. Both are bad for business. There is now an expectation for always-on availability. Operations teams emphasize must resiliency over recovery.
- Scale Matters. Businesses need to scale up quickly to meet peak loads during the holidays or product launches, and then they need to scale back down. They need an architecture that makes scaling the least of their worries.
- Flexible Data. From user generated data to machine-to-machine (M2M) activity, unstructured data is now commonplace. Businesses need flexibility to handle all the data generated and flowing.
- Omnichannel. Whether users are on a tablet, laptop, or smartphone, they require a device agnostic experience and low-latency.
- Amazon Economics. Every business wants Amazon Economics. With the nature of data growth today, businesses can’t afford expensive machines at every juncture. They need commodity machines to scale horizontally, not vertically.
Attempts to address these challenges with traditional databases result in an inflexible architecture with super high costs. “NoSQL” databases represent a fresh approach towards building flexible, resilient architectures. “NoSQL” goes where no database has ever gone before — into the wild space of the Internet and the massive scale requirements it represents.
Which brings us to NoSQL Now! Basho is sponsoring because the movement is more important than any single industry term. Andy Gross will also be on-hand to further discuss the larger trend of distributed systems:
Dealing with Systems in a New Distributed World
Chief Architect and Co-creator of Riak
Thursday, August 22, 2013
Please join us in San Jose for a look at the future of database technology.
June 19, 2013
Today, Tier 3 announced the availability of their global cloud object storage product, powered by Riak CS. You can find the entirety of the release in our News Section entitled “Tier 3 Launches Global Cloud Object Storage.”
In particular, we are keenly interested in the unique geographic footprint that Tier 3 maintains. In conversations with customers, press, and analysts, we frequently hear people discussing “geo-data locality.” This phrase typically is used to express a desire to address regulatory compliance or to improve the end-customer experience through low-latency (in the case of mobile applications).
With the Tier 3 release, their geographic footprint — in addition to maximizing availability — leverages the inherent replication present in Riak CS to pre-determine the physical locations of specific data.
For geo-data locality, requests can be load balanced across geographies, with geo-based client requests directed to the appropriate datacenter. For example, US-based requests can be served out of a Tier 3 US-based datacenter, while EU-based requests can be served out of a Tier 3 European datacenter. For situations where not all data needs to be shared across all datacenters (or if certain data, such as user data, must only be stored in a specific geographic region to provide low-latency response and address privacy regulations), Riak CS Enterprise’s multi-datacenter replication can be configured on a per-bucket basis so only shared assets, popular assets, etc. are replicated.
Tier 3 Launches Global Cloud Object Storage
New Service, Powered by Riak CS Enterprise, Delivers High Availability and “Geo-Data Locality” via Automatic and Instant Replication
BELLEVUE, WA — June 19, 2013 ― Tier 3, a provider of public cloud infrastructure and cloud management tools, today announced the general availability of a new, distributed object storage service with automated data center redundancy. Powered by Basho’s Riak CS Enterprise, this service offers enterprises flexible, scalable cloud storage for files of any type and any size. Files – or “objects” – stored in Tier 3’s cloud are automatically replicated to a secondary in-country data center. This unique feature provides native “high availability” that improves performance, reliability, and ensures that critical data will never vanish into the ether.
By contrast, most existing cloud-based object storage systems require additional engineering work to enable this automated capability, a burden that many IT professionals and developers would rather not carry.
“Object storage in the cloud has proven to be a best practice for real-time data backups and archiving. With the launch of this service, Tier 3 has added a layer of enterprise-grade capabilities on object storage, with automatic redundancy to a second geographic location,” said Jared Wray, founder and CTO of Tier 3. “Partnering with Basho has allowed us to provide enterprises with a simple, scalable storage platform for their applications and data. Furthermore, this service will support data sovereignty scenarios for sensitive data.”
Tier 3 offers the complete enterprise cloud – a growing collection of integrated services that includes virtual servers running in the public cloud, cloud management functions like automation and orchestration, as well as Web Fabric, a platform as a service offering based on Cloud Foundry. The company operates nine data centers worldwide.
Object storage from Tier 3 is available immediately in its two Canadian data centers, with additional locations in the U.S. scheduled for next quarter. Rollout will extend to Tier 3 data centers in the U.K. and mainland Europe thereafter. Users simply pay monthly for the amount of storage allocated on Tier 3’s federated cloud data centers, and may scale storage up or down as desired.
The automatic replication of data to separate data center also delivers on a growing requirement for enterprise cloud deployments – geo-data locality. This feature allows enterprises to pre-determine the physical locations of specific data. This can be used not only to address regulatory compliance, but also to improve the end-customer experience through ultra low latency.
To help deliver this new service, Tier 3 chose Riak CS Enterprise from Basho Technologies. This advanced cloud storage software runs on top of Riak, a sophisticated open-source distributed database that provides extreme high availability, fault tolerance, and operational simplicity.
“Businesses are increasingly asking for options to meet their rapidly growing and diverse object storage requirements,” said Justin Sheehy, CTO of Basho. “Tier 3’s offering is powerful as it emphasizes enterprise requirements for security and resiliency while also providing the flexibility users expect from a public cloud. Basho is excited to see Riak CS’ unique multi-data center replication perform as a core component underlying Tier 3’s new global cloud object storage offering.”
Customer Demand Fuels Development of Object Storage
As cloud adoption continues apace, so too does the demand for cloud storage. In a recent research note, Gartner predicts worldwide Cloud System Infrastructure Services (IaaS) Storage for end-user spending to grow at a 31% Compound Annual Growth Rate (CAGR), from an estimated $1.7B in 2013 to a forecast $4.9B in 2017(1).
Object storage is fueling much of this growth, since it is ideal for many enterprise scenarios, including hosting of multimedia files, data back-ups, archives, transfer of large files, and much more.
Highlights of the new object storage service from Tier 3 include:
- Storage for Large Objects. Users may store any type of file in the Tier 3 cloud – images, videos, documents, database backups, archives, and more. The service supports direct uploads for files up to 5 GB, while larger files may be stored using multipart file uploads.
- Storage for Public Files, Plus Permissions to Keep Objects Private. Users may host public files with object storage – for example, images used in web applications or downloadable multimedia content. For additional security, files stored in Tier 3 are flagged as private by default.
- Geographic Redundancy & High Availability. Object storage from Tier 3 offers multi-datacenter replication natively, with high availability already built-in.
- S3 Compatibility. Users familiar with S3 may re-use existing code assets with the Tier 3 object storage service. The company’s API supports service, bucket, and object-level operations. In addition, object storage from Tier 3 is compatible with many S3 file management utilities.
- Enterprise-Grade Security, with Key-Pair Permissions. System administrators may store collections of files in buckets that are protected with unique key-pairs. Admins can simply point-and-click to assign permissions to other users as needed.
- Scalability. Object storage from Tier 3 can easily scale to any size desired.
Additional information on Tier 3’s object storage may be found at: http://www.tier3.com/products/object-storage.
(1) Gartner [Forecast: Public Cloud Services, Worldwide, 2011-2017, 1Q13 Update], [Anderson, E., Bell, W. and others] and [2013, March 26]
About Tier 3
Tier 3 is a complete cloud management platform for mid-tier to large enterprises, as well as SaaS providers. To bring even more value to customers, Tier 3 has combined elements of the traditional enterprise cloud market with those of cloud management platforms. Tier 3’s suite of cloud products and services include advanced management and orchestration enabling our customers to run workloads ranging from simple development and test environments to the most complex and demanding enterprise applications. The Company is based in Bellevue, WA, with regional presence in multiple locations in North America and Europe. www.tier3.com.
About Basho Technologies
Basho is a distributed systems company dedicated to making software that is highly available, fault-tolerant and easy-to-operate at scale. Basho’s distributed database, Riak and Basho’s cloud storage software, Riak CS, are used by fast growing Web businesses and by over 25 percent of the Fortune 50 to power their critical Web, mobile and social applications and their public and private cloud platforms.
Riak and Riak CS are available open source. Riak Enterprise and Riak CS Enterprise offer enhanced multi-datacenter replication and 24×7 Basho support. For more information, visit basho.com. Basho is headquartered in Cambridge, Massachusetts and has offices in London, San Francisco, Tokyo and Washington DC.