February 17, 2015
According to TechTarget, a common definition of “High Availability” is:
“In information technology, high availability refers to a system or component that is continuously operational for a desirably long length of time. Availability can be measured relative to “100% operational” or “never failing.”
The reality is that this phrase has become semantically overloaded by its inclusion in marketing copy across a disparate set of technologies. Much like “Big Data”, perspectives on availability vary based on industry and customer expectation.
For many of today’s applications and platforms, high availability has a direct impact on revenue. A few examples include: cloud services, online retail, shopping carts, gaming and betting, and advertising. Further, lack of availability can damage user trust and result in a poor user experience for many social media and chat applications, websites, and mobile applications. Riak provides the high availability needed for your critical applications.
Availability – By the Numbers
As we highlighted in an infographic entitled Down with Downtime, more than 95% of businesses with 1,000+ employees estimate that they lose more than $100,000 for every 1 hour of downtime. For more than 1 in 2 large businesses, the cost of downtime amounts to more than $300,000 per hour. At the lower end of this scale, this is $83 dollars per minute. At the upper end of the spectrum (in financial services) it can amount to $1,800 a second of downtime.
This fiscal impact has resulted in availability being measured as a percentage calculation of uptime in a given year. This percentage is often referred to as the “number of 9s” of availability. For example, “one nine” of availability equates to 90% uptime in a year. Similarly, “five nines” (the standard that was set by consulting firms on enterprise projects) equates to 99.999% availability in a year. While that percentage is often referenced, the practical reality is that it means there can be no more than 6.05 seconds of unplanned downtime per week.
Availability – A Feature or A Benefit?
Often, when describing Riak, I begin by explaining the benefits of Riak (availability, scalability, fault tolerance, operational simplicity) and then discuss, in detail, the properties that these benefits are derived from. Availability is not something that can be added to a system (be it a distributed database or otherwise), rather it is an outcome of the core architectural decisions that were made in the development of the product.
Consider, for example, the AXD 301 ATM switch. It, reportedly, delivers at or better than “nine nines” (99.9999999%) of availability to customers. This is a staggering number that requires NO MORE than 6.048 milliseconds of downtime per week. Interestingly, it shares a common architectural component with Riak also being developed in Erlang.
“How does Riak achieve high availability?” Or, perhaps better stated as, “What are the architectural decisions made in Riak that enable high availability?”
Availability – An Architectural Decision
Riak is a masterless system designed for high availability, even in the event of hardware failures or network partitions. Any server (termed a “node” in Riak) can serve any incoming request and all data is replicated across multiple nodes. If a node experiences an outage, other nodes will continue to service read and write requests. Further, if a node becomes unavailable to the rest of the cluster, a neighboring node will take over the responsibilities of the missing node. The neighboring node will pass new or updated data (termed “objects”) back to the original node once it rejoins the cluster. This process is called “hinted handoff” and it ensures that read and write availability is maintained automatically to minimize your operational burden when nodes fail or comes back on-line.
More information about the architectural decisions involved in Riak’s design are available in our documentation. In particular, the Concepts – Clusters section is deeply illustrative.
Availability – A Use Case
Consider, for example the implementation of Riak at Temetra. Temetra has thousands of users and millions of meters that create billions of data points. The massive influx of data that was being generated quickly became difficult to manage with the company’s legacy SQL database. When considering how this structured database could be overhauled, Temetra conducted evaluations with Cassandra and Hadoop but ultimately chose Riak due to its high availability, relatively self-maintaining and easy to deploy infrastructure. It is essential that the data collected from the meters is always available as it is relied on to determine correct billing for Temetra’s customers.
Availability – A Summary
The reality is that a database, even a distributed, masterless, multi-model platform like Riak, is only one component of the application stack. Understanding your availability requirements requires deep knowledge of the entirety of the deployment environment. “High Availability” cannot be retrofit into a system. Rather it requires conscious effort in the early stages to ensure that customer requirements are met and that downtime does not result in lost customers and lost revenue.
Thousands have watched and enjoyed Peter Alvaro’s engaging and informative RICON 2014 Keynote presentation. Alvaro is a PhD candidate at the University of California Berkeley. His research interests lie at the intersection of databases, distributed systems, and programming languages. Alvaro’s style of delivery blends humor with deep technical detail and is especially informative for those interested in distributed systems.
In his presentation, Alvaro discusses 4 key ideas:
- Mourning the death of transactions
- What is so hard about distributed systems?
- Distributed consistency: managing asynchrony
- Fault-tolerance: progress despite failures
Alvaro starts his presentation by introducing us to Jim Gray and transactional systems. Many of you may know Gray’s work, and, sadly, that he was lost at sea in January 2007. His spirit and legacy are missed.
Alvaro provides insights into transactional systems and the top-down approach these systems traditionally used. He also points out that Eric Brewer, in his RICON 2012 keynote address, suggested that a bottoms-up approach might be needed for today’s distributed systems.
Alvaro dives into why anyone would implement distributed systems and why developing distributed systems is hard, really hard. In a distributed system, it is necessary to manage two fundamental uncertainties or failure modes — asynchrony and partial failure. Alvaro uses a humorous metaphor of two clowns to demonstrate how, in the real world, asynchrony and partial failure can’t be dealt with separately, but must be looked at together.
From his humorous metaphor come some definitions:
Distributed consistency = managing asynchrony
Fault-tolerance = progress despite failures
Alvaro then provides details on distributed consistency and when data is distributed, how consistency is handled. First, start with object-level consistency. Alvaro introduces and defines CRDTs and how these replicated data types help solve the distributed consistency challenge at the object level.
But what happens as objects are in flight? There must also be flow-level consistency for data in motion. Language-level consistency can help with this problem. Alvaro makes the following key points:
Consistency is tolerance to asynchrony
Tip: Focus on data in motion, not at rest
Alvaro then moves from distributed consistency to fault tolerance. He discusses his most recent research “lineage-driven fault injection.” He reminds us that we build systems of components and we verify these components to be fault tolerant.
However, when we put these components together it doesn’t guarantee end-to-end fault tolerance.
Alvaro talks about the challenges of the top-down approach to testing all components in a system and outlines the goal of lineage-driven fault injection (LDFI).
Alvaro then introduces us to Molly, a top-down fault injector.
He describes Molly like starting from the middle of a maze and moving to the outside as a method to arrive at a solution.
Alvaro provides detailed examples to show modeling programs using lineage so that fault tolerance can be analyzed. He then shows how the role of the adversary can be automated. He describes Molly in more detail as a prototype LDFI. Molly finds fault-tolerance violations quickly or guarantees that none exist. Alvaro provides some output using Molly and shows how lineage allows you to reason backwards from good outcomes.
Alvaro closes with a recap and explanation describing composition as the hardest problem of distributed systems.
Don’t miss this interesting and informative presentation.
Also, KDnuggets did a follow-up interview with Alvaro in which he expanded on some points made in his RICON 2014 Keynote speech. Here are links to the 2-part article:
February 1st, 2015
If you missed last week’s webinar Preparing for the Deluge of Unstructured Data you can still watch it on-demand. Dorothy Pults and I discuss the news emanating from the 2015 Consumer Electronics show and highlight that the Internet of Thing, connected devices, and the resulting explosion of unstructured data are front and center of growth trends in 2015. In particular, we covered the topics of:
- What is driving the growth in unstructured data
- The challenges associated with managing unstructured data
- How companies are capitalizing on the opportunities that unstructured data presents, to save money, time, and create new market opportunities
The webinar covers each of these topic in great details and provides some insights on distributed systems.
Why Distributed Systems?
Companies like Facebook, Amazon, and Google have built huge distributed systems with strict requirements around scalability, fault tolerance, and global footprints. These same concepts must now be considered by companies of all sizes…from the Enterprise to the startup.
The reality is that everything works at small scale. Challenges arise as it becomes necessary to scale out, up and down, predictably and linearly. When assuming that failure and latency are part of the equation, it is necessary to choose a distributed database that enables horizontal scale. And, similarly, that it enables this scale on commodity hardware or the compute instance that your business has adopted in its architecture. This is particularly important when data governance is a key component of your design considerations.
Ultimately, the customer experience matters. When designing your distributed architecture, and choosing persistence solutions like Riak, ensure that there is a solution for the geographic distribution of data (like Riak Enterprise’s multi-datacenter replication capability) to provide low latency experiences for your customers, regardless of their physical location.
For more information on this topic space, we have compiled a few resources to enable your education and decision-making.
Basho Technologies today announced the immediate availability of the second edition of Riak Handbook.
CAMBRIDGE, MA – June 1, 2012 – Basho Technologies today announced the immediate availability of the second edition of Riak Handbook. The significantly updated Riak Handbook includes more than 43 pages of new content covering many of the latest feature enhancements to Riak, Basho’s industry-leading, open-source, distributed database. Riak Handbook is authored by former Basho developer and advocate, Mathias Meyer.
Riak Handbook is a comprehensive, hands-on guide to Riak. The initial release of Riak Handbook focused on the driving forces behind Riak, including Amazon Dynamo, eventual consistency and CAP Theorem. Through a collection of examples and code, Mathias’ Riak Handbook explores the mechanics of Riak, such as storing and retrieving data, indexing, searching and querying data, and sheds a light on Riak in production. The updated handbook expands on previously covered key concepts and introduces new capabilities, including the following:
- An overview of Riak Control, a new Web-based operations management tool
- An entirely new section on deploying Erlang code in a Riak cluster
- Additional details on secondary indexes
- Insight into load balancing Riak nodes
- An introduction to network node planning
- An introduction to Riak CS, includes Amazon S3 API compatibility
The updated Riak Handbook includes an entirely new section dedicated to popular use cases and is full of examples and code from real-time usage scenarios.
Mathias Meyer is an experienced software developer, consultant and coach from Berlin, Germany. He has worked with database technology leaders such as Sybase and Oracle. He entered into the world of NoSQL in 2008 and joined Basho Technologies in 2010.
About Basho Technologies
Basho Technologies is the leader in highly-available, distributed database technologies used to power scalable, data-intensive Web, mobile, and e-commerce applications and large cloud computing platforms. Basho customers, including fast-growing Web businesses and large Fortune 500 enterprises, use Riak to implement content delivery platforms and global session stores, to aggregate large amounts of data for logging, search, and analytics, to manage, store and stream unstructured data, and to build scalable cloud computing platforms.
Riak is available open source for download at http://wiki.basho.com/Riak.html. Riak EnterpriseDS is available with advanced replication, services and 24/7 support. Riak CS enables mutli-tenant object storage with advanced reporting and an Amazon S3 compatible API. For more information visit www.basho.com or follow us on Twitter at www.twitter.com/basho.
Former Basho Developer Advocate Mathias Meyer authors a comprehensive, hands-on guide to Riak.
CAMBRIDGE, MA – January 17, 2012 – Basho Technologies, the leader in highly-available, distributed data store technologies, today announced that former Basho developer advocate Mathias Meyer has completed Riak Handbook, a comprehensive, hands-on guide to Riak, Basho’s industry-leading, open source, distributed database.
Riak Handbook begins by exploring the driving forces behind Riak, including Amazon Dynamo, eventual consistency and CAP Theorem. Through a collection of examples and code, Mathias Riak Handbook walks through Riaks many features in detail including the following capabilities:
- How to store-and-retrieve data in Riak
- Build and search full-text indexes with Riak Search
- Index and query data using secondary indexes
- Model data for eventual consistency
- Scale to multi-node clusters in less than five minutes
- Operate Riak in production
- Handle failures in your application
Mathias Meyer is an experienced software developer, consultant and coach from Berlin, Germany. He has worked with database technology leaders such as Sybase and Oracle. He entered into the world of NoSQL in 2008 and worked at Basho Technologies from 2010 to 2011.
“We are excited that Mathias took on the endeavor to build a comprehensive book all about Riak,” said John Hornbeck, Vice President of Client Services, Basho Technologies. “Our customers and community will benefit from having a single source that covers everything from setting up Riak, to scaling out quickly, to operating and maintaining Riak. We have already seen strong customer interest in Riak Handbook, including many seeking site licenses to outfit their entire teams.”
Riak Handbook is available for purchase at riakhandbook.com. Single editions are available at $29/download. Site licenses are available for organizations implementing Riak for only $249.
About Basho Technologies
Basho Technologies is the leader in highly-available, distributed data store technologies used to power scalable, data-intensive Web, mobile and e-commerce applications. Our flagship product, Riak, frees customer applications from the performance, scalability, and availability constraints of traditional databases while reducing overall storage and support costs by up to 80%. Basho customers, including fast-growing Web businesses and large Fortune 500 enterprises, use Riak to implement global session stores, to aggregate large amounts of data for logging, search, and analytics, and to manage, store and stream unstructured data.
Riak is available open source for download at basho.com/resources/downloads. Riak EnterpriseDS is available with advanced replication, services and 24/7 support. For more information visit basho.com or follow us on Twitter at www.twitter.com/basho.
Basho Technologies is based in Cambridge, MA, and maintains regional offices in San Francisco, CA and Reston, VA.