Thousands have watched and enjoyed Peter Alvaro’s engaging and informative RICON 2014 Keynote presentation. Alvaro is a PhD candidate at the University of California Berkeley. His research interests lie at the intersection of databases, distributed systems, and programming languages. Alvaro’s style of delivery blends humor with deep technical detail and is especially informative for those interested in distributed systems.
In his presentation, Alvaro discusses 4 key ideas:
- Mourning the death of transactions
- What is so hard about distributed systems?
- Distributed consistency: managing asynchrony
- Fault-tolerance: progress despite failures
Alvaro starts his presentation by introducing us to Jim Gray and transactional systems. Many of you may know Gray’s work, and, sadly, that he was lost at sea in January 2007. His spirit and legacy are missed.
Alvaro provides insights into transactional systems and the top-down approach these systems traditionally used. He also points out that Eric Brewer, in his RICON 2012 keynote address, suggested that a bottoms-up approach might be needed for today’s distributed systems.
Alvaro dives into why anyone would implement distributed systems and why developing distributed systems is hard, really hard. In a distributed system, it is necessary to manage two fundamental uncertainties or failure modes — asynchrony and partial failure. Alvaro uses a humorous metaphor of two clowns to demonstrate how, in the real world, asynchrony and partial failure can’t be dealt with separately, but must be looked at together.
From his humorous metaphor come some definitions:
Distributed consistency = managing asynchrony
Fault-tolerance = progress despite failures
Alvaro then provides details on distributed consistency and when data is distributed, how consistency is handled. First, start with object-level consistency. Alvaro introduces and defines CRDTs and how these replicated data types help solve the distributed consistency challenge at the object level.
But what happens as objects are in flight? There must also be flow-level consistency for data in motion. Language-level consistency can help with this problem. Alvaro makes the following key points:
Consistency is tolerance to asynchrony
Tip: Focus on data in motion, not at rest
Alvaro then moves from distributed consistency to fault tolerance. He discusses his most recent research “lineage-driven fault injection.” He reminds us that we build systems of components and we verify these components to be fault tolerant.
However, when we put these components together it doesn’t guarantee end-to-end fault tolerance.
Alvaro talks about the challenges of the top-down approach to testing all components in a system and outlines the goal of lineage-driven fault injection (LDFI).
Alvaro then introduces us to Molly, a top-down fault injector.
He describes Molly like starting from the middle of a maze and moving to the outside as a method to arrive at a solution.
Alvaro provides detailed examples to show modeling programs using lineage so that fault tolerance can be analyzed. He then shows how the role of the adversary can be automated. He describes Molly in more detail as a prototype LDFI. Molly finds fault-tolerance violations quickly or guarantees that none exist. Alvaro provides some output using Molly and shows how lineage allows you to reason backwards from good outcomes.
Alvaro closes with a recap and explanation describing composition as the hardest problem of distributed systems.
Don’t miss this interesting and informative presentation.
Also, KDnuggets did a follow-up interview with Alvaro in which he expanded on some points made in his RICON 2014 Keynote speech. Here are links to the 2-part article:
February 1st, 2015
If you missed last week’s webinar Preparing for the Deluge of Unstructured Data you can still watch it on-demand. Dorothy Pults and I discuss the news emanating from the 2015 Consumer Electronics show and highlight that the Internet of Thing, connected devices, and the resulting explosion of unstructured data are front and center of growth trends in 2015. In particular, we covered the topics of:
- What is driving the growth in unstructured data
- The challenges associated with managing unstructured data
- How companies are capitalizing on the opportunities that unstructured data presents, to save money, time, and create new market opportunities
The webinar covers each of these topic in great details and provides some insights on distributed systems.
Why Distributed Systems?
Companies like Facebook, Amazon, and Google have built huge distributed systems with strict requirements around scalability, fault tolerance, and global footprints. These same concepts must now be considered by companies of all sizes…from the Enterprise to the startup.
The reality is that everything works at small scale. Challenges arise as it becomes necessary to scale out, up and down, predictably and linearly. When assuming that failure and latency are part of the equation, it is necessary to choose a distributed database that enables horizontal scale. And, similarly, that it enables this scale on commodity hardware or the compute instance that your business has adopted in its architecture. This is particularly important when data governance is a key component of your design considerations.
Ultimately, the customer experience matters. When designing your distributed architecture, and choosing persistence solutions like Riak, ensure that there is a solution for the geographic distribution of data (like Riak Enterprise’s multi-datacenter replication capability) to provide low latency experiences for your customers, regardless of their physical location.
For more information on this topic space, we have compiled a few resources to enable your education and decision-making.
For years, the press and industry analysts have been telling us that cloud is mainstream, but the reality is that Enterprises must shift their workloads to the cloud in an orderly, low risk manner. While there are many applications already built and running in the cloud, there are many new (or underutilized and, perhaps, misunderstood) technologies like Docker, Chef and object storage that are changing the way cloud applications are implemented.
At RICON 2014, Basho worked with Citrix to host “Build a Cloud Day.” Build a Cloud day sessions explore new technologies and show how to bring some order to the chaos of moving workloads to the cloud. The attendees learn the concepts and best practices to deploy a cloud computing environment using Apache CloudStack and other cloud infrastructure tools, including those from XenServer, Docker, RiakCS, Chef, Zenoss, Puppet and many others that automate server and network configuration for building highly available cloud computing environments.
Cloud Architecture: Virtualization, Orchestration and Storage
“Build a Cloud Day” started with an excellent presentation by Mark Hinkle. Many of us know him as @mrhinkle. Mark is the Senior Director of Open Source Solutions at Citrix Systems where he helps support the Apache CloudStack and Xen.org projects.
Mark has an excellent grasp of cloud computing and provides an overview of cloud computing architecture and the open source software that can be used to deploy and manage a cloud-computing environment. He looks at virtualization and containers and provides a brief description of Docker and how it is being used in today’s applications.
He also provides an overview of OpenStack. Mark closes the presentation with insights into how to deliver Platform-as-a-Service (PaaS) and what technologies can be used to compliment this evolving cloud computing paradigm.
Software is Eating Infrastructure
Other presenters at “Build a Cloud Day” included Basho’s own John Burwell (@john_burwell). John is a Senior Software Engineer at Basho Technologies. He also serves as an Apache CloudStack PMC member and committer focused on storage architecture and security integration. John’s talk explores cloud design strategies to achieve high availability and reliability using commodity components and how to apply these strategies using Apache CloudStack and Riak CS.
By migrating reliability and scalability responsibilities up the stack from specialized hardware to software, cloud orchestration platforms such as Apache CloudStack (ACS) and object stores such as Riak CS increase the utilization and density of compute and storage resources by dynamically shifting workloads based on demand.
John describes two workloads predominately managed in cloud environments — traditional virtualization and cloud — and how to use Apache CloudStack to efficiently manage both simultaneously. He then explores storage design to support this dual workload model, including the use of Riak CS with Apache CloudStack to reduce infrastructure costs without sacrificing reliability.
Riak CS provides software-defined, fault-tolerant object storage uniquely built to handle a variety of unstructured and big data needs using commodity hardware.
Apache CloudStack, Apache Brooklyn and more…
There were many great presentations at “Build a Cloud Day” including:
- Primary Storage in CloudStack by Mike Tutkowski (Slides | Video)
- Introduction to Apache CloudStack by David Nalley (Slides | Video)
- Hypervisor Selection in the Cloud by Tim Mackey (Slides | Video)
- Cloud Application Blueprints with Apache Brooklyn by Alex Henevald (Slides | Video). Alex also did a Riak-specific presentation at RICON 2014, Running Riak in a Docker Cloud using Apache Brooklyn.
You can find out more about RICON 2014 in our blog post. http://basho.com/wrapping-up-ricon-2014/.
The videos of the presentations at RICON 2014 can be found on our RICON Archive site. The Keynote by Peter Alvaro – Outwards from the Middle of the Maze is very popular.
Distributed cloud storage software adds additional Amazon S3 compatibility, performance improvements, simplified admin and increased scalability
CAMBRIDGE, Mass. – August 5, 2014 – Basho, the creator and developer of Riak, the industry leading distributed NoSQL database, today introduced Riak CS 1.5 and Riak CS 1.5 Enterprise, Basho’s distributed object storage software. Riak CS (Cloud Storage) is open source software built on top of Riak, used to build public or private clouds, or, as reliable storage to power applications and services. Riak CS 1.5 delivers new features that improve operation, performance and scalability. Basho continues to offer enterprise-class features in Riak CS Enterprise, which includes multi-datacenter replication, world class 24 by 7 support and flexible pricing model.
Companies dealing with large amounts of unstructured data like videos, images and documents are adopting cloud object storage so that data is highly available through a seamlessly scalable architecture. Businesses in industries such as broadcasting and telecommunications are relying on stability, integration functionality and performance of Riak CS to efficiently store, organize and access data while making it simple to manage.
“We offer our customers affordable and scalable cloud storage solutions built on Basho’s Riak CS,” said Makoto Oya, vice director of IDC Frontier. “The enhanced Amazon S3 compatibility and ability to scale well into the multi-petabyte level in Riak CS 1.5 will help us better support the rapid growth we are seeing in our storage business.”
I-NET Corp, a data processing service headquartered in Japan, uses Riak CS for its cloud service called Dream Cloud® and is looking to achieve further cost efficiency thanks to the increased scalability capabilities in Riak CS 1.5.
“Cloud-based object storage is ideal for storing our customer’s growing business-critical data, and we have relied on the excellent performance, cost efficiency and high reliability of Riak CS for the I-NET Dream Cloud®,” said Tsutomu Taguchi, senior managing director, business group of I-NET Corp. “Riak CS already provides us with high availability and now that Riak CS is further optimized to scale, we believe that Riak CS 1.5 delivered by Basho will drive even higher adoption of Dream Cloud®.”
New features enhance performance for object storage to store increasing amounts of data worldwide
Basho delivers new functions in Riak CS that include:
- Additional Amazon S3 compatibility: Expanded storage API compatibility with S3 includes features such as multi-object delete, put object copy, and cache control headers for more flexible integration with content delivery networks (CDNs).
- Performance improvement in garbage collection process: Delivered especially for customers with high rate of object updates and deletes, Riak CS now more quickly reaps objects flagged for garbage collection.
- New, simplified administrative features: New and consolidated admin features make organizational tasks easier for activities such as cluster management, monitoring and troubleshooting.
- Multi-cluster support: Technology preview for increased scalability of Riak CS Enterprise by allowing multiple Riak clusters to reside under a single CS namespace, thereby expanding the maximum capacity of a single cluster.
“Providing the strongest key value solution and object store means responding to customer needs and demands attentively,” said Dave McCrory, CTO of Basho. “With Riak CS 1.5 Enterprise, new features are delivered as requested by our customers. We are committed to make it easier to consume cutting edge versions of Riak and will continue to do this by executing a more iterative approach in how we release Riak.”
Availability and Pricing
Riak CS 1.5 is available immediately for Debian, Ubuntu, FreeBSD, OS X, Red Hat Enterprise Linux, Fedora, SmartOS and Solaris. To view the latest technical documentation or to download Riak CS, visit docs.basho.com/riakcs/latest/.
Basho delivers customized packages for its commercial software, Riak Enterprise and Riak Enterprise Plus, with health checks, as well as options for project-based Professional Services engagements. Full pricing details of Basho commercial software are at http://basho.com/riak-enterprise/#pricing. To request a trial license of Riak CS Enterprise, prospective inquiries can request a Riak CS Tech Talk at http://info.basho.com/SignUpRiakTechTalk.html.
- Basho Website (http://basho.com)
- Basho Blog (http://basho.com/blog/)
- Riak (http://basho.com/riak/)
- Riak CS (http://basho.com/riak-cloud-storage/)
- Riak CS doc (docs.basho.com/riakcs/latest/)
- Additional Resources (http://basho.com/resources/)
- Twitter: @Basho (https://twitter.com/basho)
- LinkedIn (https://www.linkedin.com/company/basho-technologies-inc)
About Basho Technologies
Basho is a distributed systems company dedicated to making software that is highly available, fault-tolerant and easy-to-operate at scale. Basho’s distributed database, Riak, and Basho’s cloud storage software, Riak CS, are used by fast growing Web businesses and by one third of the Fortune 50 to power their critical Web, mobile and social applications and their public and private cloud platforms.
Riak and Riak CS are available open source. Riak Enterprise and Riak CS Enterprise offer enhanced multi-datacenter replication and 24×7 Basho support. For more information, visit basho.com. Basho is headquartered in Cambridge, Massachusetts.
March 24, 2014
When selecting a NoSQL solution, there are many options to choose from, each different and with their own benefits depending on your use case. To help you decide what the right choice for your needs may be, there are two amazing events this week where many NoSQL providers (including Basho) will be speaking.
The first is in conjunction with Ad:Tech in San Francisco. For advertisers to stay competitive in the modern landscape, the need to crunch massive amounts of consumer profile data and enable real-time bidding has made NoSQL the gold standard in database technology. That’s why Basho partner, GoGrid, will be hosting the panel, “NoSQL: Digital Advertising’s ‘Bad Boy’ Database Comes of Age” at 111 Minna Gallery. Speakers from Basho, Couchbase, DataStax, and MongoDB will be there to discuss how NoSQL is helping advertisers push the envelope now, and what is to come in 2014. This panel will take place on Wednesday, March 26th at 5:30pm. Registration is free and tickets are still available.
The other is hosted by the New York Software Engineers. This meetup, “The Battle of Distributed Databases – Data Modeling in the Enterprise Ecosystem,” will address some of the challenges the NoSQL community faces in enterprise adoption. Casey Rosenthal, Director of Professional Services at Basho, will be speaking about Riak and its adoption with 30% of the Fortune 50. This meetup will take place on Wednesday, March 26th at 7pm at Foursquare’s office. Be sure and register for this free event.
To see how enterprises are using Riak, check out the Users Page.
In addition to these meetups, Basho will be at multiple other events and conferences. A complete list can be found on our Events Page.
January 9, 2014
Puppet Labs hosts regular podcasts that feature the leaders in automation, operations, and technology. Last week, they invited Basho engineer, Eric Redmond, to speak about design patterns for distributed systems.
Eric’s talk aims to show that an average programmer can create a highly available system in any language. He also discusses many of the tradeoffs involved in implementing some of these different distributed design patterns, including speed, capacity, uptime and data integrity. Finally, he wraps up by talking about Riak as a distributed system that persists data rather than a database that has been given distributed functionality and looks at some of the upcoming features being added with Riak 2.0.
You can listen to Eric’s full podcast here.
For more information on design patterns for distributed systems, you can check out slides from his full presentation at OSCON 2013.
December 18, 2013
Downtime, planned or unplanned, is no longer an option. It can have a dramatic impact on revenue and lead to negative customer experiences and attrition. Luckily, distributed NoSQL databases (such as Basho Riak) are designed to provide high availability, even during network partition or server failure. This means there will never be an excuse for downtime again.
To help demonstrate the cost of downtime and how Riak can help, we have put together an infographic, “Down With Downtime.” Zoom in by clicking the image below.
November 27, 2013
Join Basho and 451 Research on Tuesday, December 10th at 10am PT for a live webinar, “Beyond NoSQL – Distributed Databases in Production.”
This webinar will feature Matt Aslett, Research Director at 451 Research, and Bobby Patrick, EVP and CMO at Basho Technologies. This webinar will set the stage with NoSQL trends and adoption across various industries. It will then discuss some of the key benefits of distributed NoSQL systems and explore how systems like Riak are evolving.
Wes Jossey, Systems Engineer at Tapjoy, will also be joining the webinar to discuss how Tapjoy uses distributed databases to provide reliable data locality to their customers through multi-datacenter replication.
Register here for the free “Beyond NoSQL – Distributed Databases in Production” webinar.
Portland, ME – October 3, 2013 – Basho is sponsoring The Monktoberfest, a developers conference about how social trends can change the way we build and use technology and how technology in turn can change the way we socialize.
In addition to sponsoring, Basho’s Director of Marketing, Tyler Hannan, will be presenting on “Medieval Art, Collective Intelligence, and Language Abuse – The Ethos of Distributed Systems.” His talk will discuss how the distributed systems movement – and open source technology more broadly – are fueled by a series of social tools. He will go into how IRC, internal chat tools, GH issues have become de rigueur communication vehicles and how this collaboration has resulted in a new language. Tyler will also examine what can be learned from the story of Brunelleschi and his approach to drawing in perspective, the notion of collective intelligence, and Melville Dewey when considered in light of a modern era of distributed systems and computing.
This is the third annual The Monktoberfest and takes place October 3-4th in Portland, Maine.
September 19, 2013
Strange Loop 2013 is currently taking over St. Louis, MO through September 20th. Strange Loop is a multi-disciplinary conference that brings together developers and thinkers to discuss technologies around emerging languages, concurrent and distributed systems, mobile development, and the web. Basho is a proud sponsor and many members of our team will be there to discuss Riak CS, our open source cloud storage software.
Garrett Eardley, Software Engineer at Riot Games, will also be presenting on how Riot Games is leveraging Riak for their next generation stats system. His talk, “Tracking Millions of Ganks in Near Real Time,” will discuss why they chose to use Riak (and move from their existing MySQL architecture), how they structure their data model and indexes, and their strategies for working with eventually consistent data. His talk will take place today, September 19th, at 9:50am.
Stop by the Basho table to grab some swag and to learn more about distributed systems.