Thousands have watched and enjoyed Peter Alvaro’s engaging and informative RICON 2014 Keynote presentation. Alvaro is a PhD candidate at the University of California Berkeley. His research interests lie at the intersection of databases, distributed systems, and programming languages. Alvaro’s style of delivery blends humor with deep technical detail and is especially informative for those interested in distributed systems.
In his presentation, Alvaro discusses 4 key ideas:
- Mourning the death of transactions
- What is so hard about distributed systems?
- Distributed consistency: managing asynchrony
- Fault-tolerance: progress despite failures
Alvaro starts his presentation by introducing us to Jim Gray and transactional systems. Many of you may know Gray’s work, and, sadly, that he was lost at sea in January 2007. His spirit and legacy are missed.
Alvaro provides insights into transactional systems and the top-down approach these systems traditionally used. He also points out that Eric Brewer, in his RICON 2012 keynote address, suggested that a bottoms-up approach might be needed for today’s distributed systems.
Alvaro dives into why anyone would implement distributed systems and why developing distributed systems is hard, really hard. In a distributed system, it is necessary to manage two fundamental uncertainties or failure modes — asynchrony and partial failure. Alvaro uses a humorous metaphor of two clowns to demonstrate how, in the real world, asynchrony and partial failure can’t be dealt with separately, but must be looked at together.
From his humorous metaphor come some definitions:
Distributed consistency = managing asynchrony
Fault-tolerance = progress despite failures
Alvaro then provides details on distributed consistency and when data is distributed, how consistency is handled. First, start with object-level consistency. Alvaro introduces and defines CRDTs and how these replicated data types help solve the distributed consistency challenge at the object level.
But what happens as objects are in flight? There must also be flow-level consistency for data in motion. Language-level consistency can help with this problem. Alvaro makes the following key points:
Consistency is tolerance to asynchrony
Tip: Focus on data in motion, not at rest
Alvaro then moves from distributed consistency to fault tolerance. He discusses his most recent research “lineage-driven fault injection.” He reminds us that we build systems of components and we verify these components to be fault tolerant.
However, when we put these components together it doesn’t guarantee end-to-end fault tolerance.
Alvaro talks about the challenges of the top-down approach to testing all components in a system and outlines the goal of lineage-driven fault injection (LDFI).
Alvaro then introduces us to Molly, a top-down fault injector.
He describes Molly like starting from the middle of a maze and moving to the outside as a method to arrive at a solution.
Alvaro provides detailed examples to show modeling programs using lineage so that fault tolerance can be analyzed. He then shows how the role of the adversary can be automated. He describes Molly in more detail as a prototype LDFI. Molly finds fault-tolerance violations quickly or guarantees that none exist. Alvaro provides some output using Molly and shows how lineage allows you to reason backwards from good outcomes.
Alvaro closes with a recap and explanation describing composition as the hardest problem of distributed systems.
Don’t miss this interesting and informative presentation.
Also, KDnuggets did a follow-up interview with Alvaro in which he expanded on some points made in his RICON 2014 Keynote speech. Here are links to the 2-part article:
A Muddle That Slows Adoption
Basho released Riak as an open source project seven months ago and began commercial service shortly thereafter. As new entrants into the loose collection of database projects we observed two things:
- Widespread Confusion — the NoSQL rubric, and the decisions of most projects to self-identify under it, has created a false perception of overlap and similarity between projects differing not just technically but in approaches to licensing and distribution, leading to…
- Needless Competition — driving the confusion, many projects (us included, for sure) competed passionately (even acrimoniously) for attention as putative leaders of NoSQL, a fool’s errand as it turns out. One might as well claim leadership of all tools called wrenches.
The optimal use cases, architectures, and methods of software development differ so starkly even among superficially similar projects that to compete is to demonstrate a (likely pathological) lack of both user needs and self-knowledge.
This confusion and wasted energy — in other words, the market inefficiency — has been the fault of anyone who has laid claim to, or professed designs on, the NoSQL crown.
- Adoption suffers — Users either make decisions based on muddled information or, worse, do not make any decision whatsoever.
- Energy is wasted — At Basho we spent too much time from September to December answering the question posed without fail by investors and prospective users and clients: “Why will you ‘win’ NoSQL?”
With the vigor of fools, we answered this question, even though we rarely if ever encountered another project’s software in a “head-to-head” competition. (In fact, in the few cases where we have been pitted head-to-head against another project, we have won or lost so quickly that we cannot help but conclude the evaluation could have been avoided altogether.)
The investors and users merely behaved as rational (though often bemused) market actors. Having accepted the general claim that NoSQL was a monolithic category, both sought to make a bet.
Clearly what is needed is objective information presented in an environment of cooperation driven by mutual self-interest.
This information, shaped not by any one person’s necessarily imperfect understanding of the growing collection of data storage projects but rather by all the participants themselves, would go a long way to remedying the inefficiencies discussed above.
Demystification through data, not marketing claims
We have spoken to representatives of many of the emerging database projects. They have enthusiastically agreed to participate in a project to disclose data about each project. Disclosure will start with the following: a common benchmark framework and benchmarks/load tests modeling common use cases.
- A Common Benchmark Framework — For this collaboration to succeed, no single aspect will impact success or failure more than arriving at a common benchmark framework.
At Basho we have observed the proliferation of “microbenchmarks,” or benchmarks that do not reflect the conditions of a production environment. Benchmarks that use a small data set, that do not store to disk, that run for short (less than 12 hours) durations, do more to confuse the issue for end users than any single other factor. Participants will agree on benchmark methods, tools, applicability to use cases, and to make all benchmarks reproducible.
Compounding the confusion is when benchmarks are used for comparison of different use cases or was run on different hardware and yet compared head-to-head as if the tests or systems were identical. We will seek to help participants run equivalent on the various databases and we will not publish benchmark results that do not profile the hardware and configuration of the systems.
- Benchmarks That Support Use Cases — participants agree to benchmark their software under the conditions and with load tests reflective of use cases they commonly see in their user base or for which they think their software is best suited.
- Dissemination to third-parties — providing easy-to-find data to any party interested in posting results.
- Honestly exposing disagreement — Where agreement cannot be reached on any of the elements of the common benchmarking efforts, participants will respectfully expose the rationales for their divergent positions, thus still providing users with information upon which to base decisions.
There is more work to be done but all participants should begin to see the benefits: faster, better decisions by users.
We invite others to join, once we are underway. We, and our counterparts at other projects, believe this approach will go a long way to removing the inefficiencies hindering adoption of our software.