Thousands have watched and enjoyed Peter Alvaro’s engaging and informative RICON 2014 Keynote presentation. Alvaro is a PhD candidate at the University of California Berkeley. His research interests lie at the intersection of databases, distributed systems, and programming languages. Alvaro’s style of delivery blends humor with deep technical detail and is especially informative for those interested in distributed systems.
In his presentation, Alvaro discusses 4 key ideas:
- Mourning the death of transactions
- What is so hard about distributed systems?
- Distributed consistency: managing asynchrony
- Fault-tolerance: progress despite failures
Alvaro starts his presentation by introducing us to Jim Gray and transactional systems. Many of you may know Gray’s work, and, sadly, that he was lost at sea in January 2007. His spirit and legacy are missed.
Alvaro provides insights into transactional systems and the top-down approach these systems traditionally used. He also points out that Eric Brewer, in his RICON 2012 keynote address, suggested that a bottoms-up approach might be needed for today’s distributed systems.
Alvaro dives into why anyone would implement distributed systems and why developing distributed systems is hard, really hard. In a distributed system, it is necessary to manage two fundamental uncertainties or failure modes — asynchrony and partial failure. Alvaro uses a humorous metaphor of two clowns to demonstrate how, in the real world, asynchrony and partial failure can’t be dealt with separately, but must be looked at together.
From his humorous metaphor come some definitions:
Distributed consistency = managing asynchrony
Fault-tolerance = progress despite failures
Alvaro then provides details on distributed consistency and when data is distributed, how consistency is handled. First, start with object-level consistency. Alvaro introduces and defines CRDTs and how these replicated data types help solve the distributed consistency challenge at the object level.
But what happens as objects are in flight? There must also be flow-level consistency for data in motion. Language-level consistency can help with this problem. Alvaro makes the following key points:
Consistency is tolerance to asynchrony
Tip: Focus on data in motion, not at rest
Alvaro then moves from distributed consistency to fault tolerance. He discusses his most recent research “lineage-driven fault injection.” He reminds us that we build systems of components and we verify these components to be fault tolerant.
However, when we put these components together it doesn’t guarantee end-to-end fault tolerance.
Alvaro talks about the challenges of the top-down approach to testing all components in a system and outlines the goal of lineage-driven fault injection (LDFI).
Alvaro then introduces us to Molly, a top-down fault injector.
He describes Molly like starting from the middle of a maze and moving to the outside as a method to arrive at a solution.
Alvaro provides detailed examples to show modeling programs using lineage so that fault tolerance can be analyzed. He then shows how the role of the adversary can be automated. He describes Molly in more detail as a prototype LDFI. Molly finds fault-tolerance violations quickly or guarantees that none exist. Alvaro provides some output using Molly and shows how lineage allows you to reason backwards from good outcomes.
Alvaro closes with a recap and explanation describing composition as the hardest problem of distributed systems.
Don’t miss this interesting and informative presentation.
Also, KDnuggets did a follow-up interview with Alvaro in which he expanded on some points made in his RICON 2014 Keynote speech. Here are links to the 2-part article:
December 11, 2013
In the world of distributed systems, there are still a lot of unsolved problems and improvements to be made. This means that there is a lot of interesting research being done at top institutions around the world – with some of the brightest minds looking to improve distributed systems. At RICON West, Basho’s developer conference, we brought three PhD students and candidates to speak, whose work on distributed systems has been vital to both Basho and the future of the industry.
Peter Bailis is a PhD student at UC Berkeley. His talk, “Bad As I Wanna Be: Coordination and Consistency in Distributed Databases,” goes into how to reason about the trade-offs between coordination, consistency, latency, and availability, with a focus on practical takeaways from recent research both at UC Berkeley and beyond. He also talks about reconciling “consistency” in NoSQL and ACID databases and explains why, even though you probably didn’t “beat the CAP Theorem,” you (and tomorrow’s database designs) may be on to something. His full talk is below.
Lindsey Kuper is a PhD candidate at Indiana University, who studies the foundations of deterministic parallel programming. At RICON, she spoke on “LVars: Lattice-Based Data Structures for Deterministic Parallelism,” which introduces LVars (data structures that enable deterministic parallel programming). LVars generalize the single-assignment variables often found in deterministic parallel languages to allow multiple assignments that are monotonically increasing with respect to a user-specified lattice of states. LVars maintain determinism by allowing only monotonic writes and “threshold” reads to and from shared data. Her talk looks at examples of programming in an LVar-based parallel language that is provably deterministic, and explores the connection between LVars and CRDTs. The complete talk is below.
Finally, we had Diego Ongaro, a PhD student at Stanford University, talk about “The Raft Consensus Algorithm.” His talk discusses Raft, a consensus algorithm designed for understandability and developed by Diego and Professor John Ousterhout. Raft is equivalent to Paxos in fault-tolerance and performance, but it’s designed to be as easy to understand as possible, while cleanly addressing all major pieces needed for practical systems. The hope is that Raft will make consensus available to a wider audience, and that this wider audience will be able to develop a wider variety of higher quality consensus-based systems than are available today. You can learn more about Raft below.
To watch all of the sessions from RICON West 2013, visit the Basho Technologies Youtube Channel.
October 29, 2013
Throughout RICON West, we will be discussing many of the Riak 2.0 features (both in track sessions or during lightning talks), so keep your eyes on the live stream over the next two days. Videos of all sessions will also be made available after the conference.
Here is a look at some of the major enhancements available in Riak 2.0:
- Riak Data Types. Building on the eventually consistent counters introduced in Riak 1.4, Riak 2.0 adds sets and maps as new distributed data types. These Riak Data Types simplify application development without sacrificing Riak’s availability and partition tolerance characteristics.
- Strong Consistency. Developers have the flexibility to choose whether buckets should be eventually consistent (the default Riak configuration today that provides high availability) or strongly consistent, based on data requirements.
- Full-Text Search Integration with Apache Solr. Riak Search is completely redesigned in Riak 2.0, leveraging the Apache Solr engine. Riak Search in 2.0 supports the Solr client query APIs, enabling integration with a wide range of existing software and commercial solutions.
- Security. Riak 2.0 adds the ability to administer access rights and utilize plug-in authentication models. Authentication and Authorization is provided via client APIs.
- Simplified Configuration Management. Riak 2.0 continues to improve Riak’s operational simplicity by changing how, and where, configuration information is stored in an easy-to-parse and transparent format.
- Reduced Replicas for Multiple Data Centers. Riak Enterprise 2.0 can optionally store fewer copies of replicated data across multiple data centers to better maintain a balance between storage overhead and availability.
Ready to get started? Download the Technical Preview.
Please note that this is only a Technical Preview of Riak 2.0. This means that it has been tested extensively, as we do with all of our release candidates, but there is still work to be completed to ensure it’s production hardened. Between now and the final release, we will be continuing manual and automated testing, creating detailed use cases, gathering performance statistics, and updating the documentation for both usage and deployment.
Riak 2.0 Technical Preview: Deep Dive
Riak Data Types
In distributed systems, we are forced to trade consistency for availability (see: CAP Theorem) and this can complicate some aspects of application design. In Riak 2.0, we have integrated cutting-edge research on data types known as called CRDTs (Conflict-Free Replicated Data Types) pioneered by INRIA to create Riak Data Types. By adding counters, sets, maps, registers, and flags, these Riak Data Types enable developers to spend less time thinking about the complexities of vector clocks and sibling resolution and, instead, focusing on using familiar, distributed data types to support their applications’ data access patterns.
A more detailed overview of Riak Data Types is available that examines implementation considerations and the basics of usage.
In all prior versions, Riak was classified as an eventually consistent system. With the 2.0 release, Riak now lets developers choose when operations should be strongly or eventually consistent. This gives developers a choice between these semantics for different types of data. At the same time, operators can continue to enjoy the operational simplicity of Riak. Consistency preferences are defined on a per bucket type basis, in the same cluster.
A RICON West 2012 talk entitled, Bringing Consistency to Riak, shares much of the initial thinking behind this effort. In addition, the pull request that adds consistency to
riak_kv provides detailed information about related repositories and the implementation approach.
Redesigned Full-Text Search
Riak is a key/value store and the values are simply stored on disk as binary. With previous versions of Riak Search, Riak developers have long been able to index the content of these stored values. In Riak 2.0, Riak Search (code-named Yokozuna) has been completely redesigned and now uses the Apache Solr full-text document indexing engine directly. Together, Riak and Solr provide a reliable full-text context indexing solution that is highly available and built for scale. In addition, Riak Search 2.0 also fully supports the Solr client query APIs, which enables integration with existing software solutions (either homegrown or commercial).
The Basho engineers responsible for Yokozuna have created a resources page that includes recorded talks, Solr documentation links, and books on the topic.
Basho designed Riak with critical data in mind. Whether it’s data that affects revenue, user experience, or even a patient’s health (as is the case with the NHS), Riak ensures that this critical data is always available. However, often this critical data is also sensitive data. Riak 2.0 adds security to this data through the ability to administer access rights and plug-in various secure authentication models commonly used today.
The initial RFC that describes the security effort, including related Pull Requests, is available at github.com/basho/riak/issues/355.
Simplified Configuration Management
At Basho, we pride ourselves on providing operationally friendly software that functions smoothly when dealing with the challenges of a distributed system. In the past, configuration of Riak occurred in two files:
vm.args. Riak 2.0 changes how and where configuration information is stored. It no longer uses Erlang-specific syntax but, rather, provides a layout more suited for all operators and automated deployment tools. This layout is easy to parse and transparent for Riak administrators.
More information on the vision and specific implementation considerations are contained in the repository at github.com/basho/cuttlefish.
In versions of Riak prior to 2.0, keys were made up of two parts: the bucket they belong to and a unique identifier within that bucket. Buckets act as a namespace and allow for similar keys to be grouped. In addition, they provide a means of configuring how previous versions of Riak treated that data.
In Riak 2.0, several new features (security and strong consistency in particular) need to interact with groups of buckets. To this end, Riak 2.0 includes the concept of a Bucket Type. In addition to allowing new features without special prefixes in Bucket names, Riak developers and operators are able to define a group of buckets that share the same properties and only store information about each Bucket Type, rather than individual buckets.
More information about Bucket Types can be found in the Github Issue at github.com/basho/riak/issues/362. This issue describes the planned functionality, discussions about implementation, and includes related pull requests.
Change in Defaults for Sibling Resolution
Riak has always supported both application-side and timestamp and vector clock-based Last Write Wins server-side resolution. Prior to Riak 2.0, vector clock-based Last Write Wins has been the default. Moving forward, new clusters will hand off siblings to applications by default. This is the safest way to work with Riak, but requires developers to be aware of sibling resolution.
More Efficient Use of Physical Memory
Riak nodes are designed to manage the changing demands of a cluster as it experiences network, hardware, and other failures. To do this, Riak balances each node’s resources accordingly. Riak 2.0 has vastly improved LevelDB’s use of available physical memory (RAM) by allowing local databases to dynamically change their cache sizes as the cluster fluctuates under load.
In the past, it was necessary to specify RAM allocation for different LevelDB caches independently. This is no longer the case. In Riak 2.0, LevelDB databases that manage key/value or active anti-entropy data share a single pool of memory, and administrators are free to allocate as much of the available RAM to LevelDB as they feel is appropriate in their deployment. Detailed implementation documentation can be found in the basho/leveldb wiki.
Riak Ruby Vagrant Project
If you are interested in testing Riak 2.0, in a contained environment with the Riak Ruby Client, Basho engineer Bryce Kerley has put together the Riak-Ruby-Vagrant repository. In addition, this environment can be easily adapted to usage with other clients for testing the new features of Riak 2.0.