Thousands have watched and enjoyed Peter Alvaro’s engaging and informative RICON 2014 Keynote presentation. Alvaro is a PhD candidate at the University of California Berkeley. His research interests lie at the intersection of databases, distributed systems, and programming languages. Alvaro’s style of delivery blends humor with deep technical detail and is especially informative for those interested in distributed systems.
In his presentation, Alvaro discusses 4 key ideas:
- Mourning the death of transactions
- What is so hard about distributed systems?
- Distributed consistency: managing asynchrony
- Fault-tolerance: progress despite failures
Alvaro starts his presentation by introducing us to Jim Gray and transactional systems. Many of you may know Gray’s work, and, sadly, that he was lost at sea in January 2007. His spirit and legacy are missed.
Alvaro provides insights into transactional systems and the top-down approach these systems traditionally used. He also points out that Eric Brewer, in his RICON 2012 keynote address, suggested that a bottoms-up approach might be needed for today’s distributed systems.
Alvaro dives into why anyone would implement distributed systems and why developing distributed systems is hard, really hard. In a distributed system, it is necessary to manage two fundamental uncertainties or failure modes — asynchrony and partial failure. Alvaro uses a humorous metaphor of two clowns to demonstrate how, in the real world, asynchrony and partial failure can’t be dealt with separately, but must be looked at together.
From his humorous metaphor come some definitions:
Distributed consistency = managing asynchrony
Fault-tolerance = progress despite failures
Alvaro then provides details on distributed consistency and when data is distributed, how consistency is handled. First, start with object-level consistency. Alvaro introduces and defines CRDTs and how these replicated data types help solve the distributed consistency challenge at the object level.
But what happens as objects are in flight? There must also be flow-level consistency for data in motion. Language-level consistency can help with this problem. Alvaro makes the following key points:
Consistency is tolerance to asynchrony
Tip: Focus on data in motion, not at rest
Alvaro then moves from distributed consistency to fault tolerance. He discusses his most recent research “lineage-driven fault injection.” He reminds us that we build systems of components and we verify these components to be fault tolerant.
However, when we put these components together it doesn’t guarantee end-to-end fault tolerance.
Alvaro talks about the challenges of the top-down approach to testing all components in a system and outlines the goal of lineage-driven fault injection (LDFI).
Alvaro then introduces us to Molly, a top-down fault injector.
He describes Molly like starting from the middle of a maze and moving to the outside as a method to arrive at a solution.
Alvaro provides detailed examples to show modeling programs using lineage so that fault tolerance can be analyzed. He then shows how the role of the adversary can be automated. He describes Molly in more detail as a prototype LDFI. Molly finds fault-tolerance violations quickly or guarantees that none exist. Alvaro provides some output using Molly and shows how lineage allows you to reason backwards from good outcomes.
Alvaro closes with a recap and explanation describing composition as the hardest problem of distributed systems.
Don’t miss this interesting and informative presentation.
Also, KDnuggets did a follow-up interview with Alvaro in which he expanded on some points made in his RICON 2014 Keynote speech. Here are links to the 2-part article:
April 2, 2014
Basho partners with a wide range of companies to ensure there are many options to choose from when building out a database architecture with Riak. One aspect of our partners program is through Infrastructure. Our Infrastructure Partners create optimized Riak solutions that ensure high performance and efficiency, regardless of scale.
One of these partners is Brocade. Brocade helps organizations achieve critical business initiatives as they transition to a world where applications and information reside anywhere. Basho’s Riak and Brocade’s VCS Fabric Technology can be leveraged together to deliver a highly available, fault-tolerant, predictable, and easy-to-use distributed system.
In addition to partnering, we also jointly validated the performance of Brocade VDX switches in a Riak deployment. Testing teams ran 30-minute benchmarks of reads and read/writes against the cluster as the number of nodes grew from five to ten. The tests showed consistently high network throughput and consistently low latency. These results are indicative of highly efficient and predictable performance from the underlying network.
For more information about Basho and Brocade, check out the complete solutions brief.
For more information about Basho’s partners, visit the Partnerships Page.
If you’re a hosting provider, hardware or infrastructure provider, consulting service, or systems integrator, we’d love to partner with you! To discuss a partnership, please Contact Us.
October 24, 2013
Next week, Basho will be sponsoring the O’Reilly Strata conference. This conference focuses on big data technologies and brings together decision makers, architects, developers, and analysts to discuss how to collect and use data successfully. The Strata Conference is combined with Hadoop World and takes place in New York City on October 28-30th.
On Wednesday, October 30th at 1:45pm, Jim Englert will speak on “Testing Riak for Multiple Data-Center Support: A Case Study.” Jim is the Lead Software Engineer at Gilt and will discuss his experience evaluating Riak before using it as the backend for the company’s main user store. In this talk, Jim will go through the testing process, the results of this stress test, and how Gilt – one of the top eCommerce companies in the U.S. – managed to test the features in production without outages or other interruptions. He will also discuss the strategy, tools, and processes necessary to achieve this feat in just five days, and also cover Gilt’s motivation for investigating Riak (such as the ability to span multiple data centers to expand across the Gilt’s physically separated data centers and into the cloud).
Before his talk, Jim will also be hosting office hours to further discuss Gilt’s use of Riak to help solve a variety of issues including disaster recovery, data availability across data centers, and scalability.
Throughout the conference, be sure and stop by the Basho booth for more information about Riak and how it can handle critical data at scale.