February 17, 2015
According to TechTarget, a common definition of “High Availability” is:
“In information technology, high availability refers to a system or component that is continuously operational for a desirably long length of time. Availability can be measured relative to “100% operational” or “never failing.”
The reality is that this phrase has become semantically overloaded by its inclusion in marketing copy across a disparate set of technologies. Much like “Big Data”, perspectives on availability vary based on industry and customer expectation.
For many of today’s applications and platforms, high availability has a direct impact on revenue. A few examples include: cloud services, online retail, shopping carts, gaming and betting, and advertising. Further, lack of availability can damage user trust and result in a poor user experience for many social media and chat applications, websites, and mobile applications. Riak provides the high availability needed for your critical applications.
Availability – By the Numbers
As we highlighted in an infographic entitled Down with Downtime, more than 95% of businesses with 1,000+ employees estimate that they lose more than $100,000 for every 1 hour of downtime. For more than 1 in 2 large businesses, the cost of downtime amounts to more than $300,000 per hour. At the lower end of this scale, this is $83 dollars per minute. At the upper end of the spectrum (in financial services) it can amount to $1,800 a second of downtime.
This fiscal impact has resulted in availability being measured as a percentage calculation of uptime in a given year. This percentage is often referred to as the “number of 9s” of availability. For example, “one nine” of availability equates to 90% uptime in a year. Similarly, “five nines” (the standard that was set by consulting firms on enterprise projects) equates to 99.999% availability in a year. While that percentage is often referenced, the practical reality is that it means there can be no more than 6.05 seconds of unplanned downtime per week.
Availability – A Feature or A Benefit?
Often, when describing Riak, I begin by explaining the benefits of Riak (availability, scalability, fault tolerance, operational simplicity) and then discuss, in detail, the properties that these benefits are derived from. Availability is not something that can be added to a system (be it a distributed database or otherwise), rather it is an outcome of the core architectural decisions that were made in the development of the product.
Consider, for example, the AXD 301 ATM switch. It, reportedly, delivers at or better than “nine nines” (99.9999999%) of availability to customers. This is a staggering number that requires NO MORE than 6.048 milliseconds of downtime per week. Interestingly, it shares a common architectural component with Riak also being developed in Erlang.
“How does Riak achieve high availability?” Or, perhaps better stated as, “What are the architectural decisions made in Riak that enable high availability?”
Availability – An Architectural Decision
Riak is a masterless system designed for high availability, even in the event of hardware failures or network partitions. Any server (termed a “node” in Riak) can serve any incoming request and all data is replicated across multiple nodes. If a node experiences an outage, other nodes will continue to service read and write requests. Further, if a node becomes unavailable to the rest of the cluster, a neighboring node will take over the responsibilities of the missing node. The neighboring node will pass new or updated data (termed “objects”) back to the original node once it rejoins the cluster. This process is called “hinted handoff” and it ensures that read and write availability is maintained automatically to minimize your operational burden when nodes fail or comes back on-line.
More information about the architectural decisions involved in Riak’s design are available in our documentation. In particular, the Concepts – Clusters section is deeply illustrative.
Availability – A Use Case
Consider, for example the implementation of Riak at Temetra. Temetra has thousands of users and millions of meters that create billions of data points. The massive influx of data that was being generated quickly became difficult to manage with the company’s legacy SQL database. When considering how this structured database could be overhauled, Temetra conducted evaluations with Cassandra and Hadoop but ultimately chose Riak due to its high availability, relatively self-maintaining and easy to deploy infrastructure. It is essential that the data collected from the meters is always available as it is relied on to determine correct billing for Temetra’s customers.
Availability – A Summary
The reality is that a database, even a distributed, masterless, multi-model platform like Riak, is only one component of the application stack. Understanding your availability requirements requires deep knowledge of the entirety of the deployment environment. “High Availability” cannot be retrofit into a system. Rather it requires conscious effort in the early stages to ensure that customer requirements are met and that downtime does not result in lost customers and lost revenue.
December 30, 2014
At Basho, we are proud of our documentation. All design, updates, and edits are done with our community top of mind and we encourage community participation. Given the pace at which our documentarian expert, Luc Perkins, is updating the content, it can be easy to fall behind in reading new and updated materials. So we have a holiday gift to help you out.
Below is our Top 10 suggested New Year’s reading list.
#10 – A Migrating from an SQL Database to Riak tutorial can help prepare you as embrace a new style of development and persistence.
#7 – Strong consistency has gone from having light documentation to being one of our best-documented open-source features. Strong Consistency docs are spread across the following:
#6 – We now have client-side security docs! There’s an introductory doc that walks you a bit through how client security works in Riak as well as client-specific docs for Java, Ruby, Python, and Erlang.
#5 – A new Erlang VM Tuning doc. This is still a work in progress. As we said at the beginning, we really encourage community involvement. What tuning have you done to optimize your Erlang environment?
In addition to the above, there is new documentation on the topics below.
Drum roll please….
#1 – Riak 2.0 – if you missed this you missed a lot.
We want to thank everyone in the community who participates in making the Basho documentation the most useful set of materials possible. Remember: to submit issues is human, to submit PRs is divine.
Happy New Year!
October 14, 2013
Kivra is a Swedish company that provides secure digital mailboxes, allowing users to securely receive, upload, and store all postal mail. Their mailboxes help users organize bills and notifications while eliminating the environmental footprint created from paper mail.
At a Riak Meetup in Dublin earlier this year, Kivra CTO, Bip Thelin, spoke about how they currently use Riak. Originally, Kivra tried to build their platform on a SQL database. However, they quickly outgrew this system and decided to move their infrastructure to Erlang and Riak because of its scalability and resiliency. You can watch Bip’s full presentation below to learn more about why they chose to switch to Riak, how they built six Riak clusters, and some of the lessons they’ve learned.
Bip’s slides can also be viewed here:
For more companies that have switched from SQL to Riak, check out our Users Page.
Chicago, IL – July 8, 2013 – Throughout the Lambda Jam Conference this week, Basho will be presenting twice about various aspects of Riak, as well as hosting a workshop on Webmachine. Lambda Jam is a conference for functional programmers and features a mix of sessions and workshops. It takes place in Chicago from July 8-10.
John Daily (Technical Evangelist at Basho), will be presenting first on “Distributed Programming with Riak Core and Pipe.” During his talk, he will dive into how Riak Core and Riak Pipe can be used, both within and beyond Basho. His talk begins at 9am on Tuesday, July 9th.
On July 10th at 9:50am, Basho Architect, Steve Vinoski, will be speaking on “Addressing Network Congestion in Riak Clusters.” In this talk, he will discuss an experimental approach to alleviating network congestion effects, such as timeouts and throughput collapse for Riak clusters under extreme load. In addition to exploring network scalability issues, this talk shows how Erlang can seamlessly integrate with non-FP languages.
Finally, Sean Cribbs and Chris Meiklejohn (Software Engineers at Basho) will be hosting a workshop entitled, “Functional Web Applications with Webmachine.” This workshop will provide guidance for understanding and getting started with Webmachine. It will then gradually expose richer HTTP features, while building out an application that is used by browsers and API clients alike. Their workshop begins at 1pm on July 10th.
To see where else Basho will be speaking, please visit our Basho Events Page.
July 2, 2013
We use the Erlang/OTP programming language in building our products here at Basho. We made that choice consciously, believing that it would be a tradeoff – significant benefits balanced by a handful of costs. I am often asked if we would make the same choice all over again. To answer that question I need to address the tradeoff we thought we were making.
The single most compelling reason to choose Erlang was the attribute for which it is best known: extremely high availability. The original design goal for Erlang was to enable rapid development of highly robust concurrent systems that “run forever.” The poster child of its success (outside Riak, of course) is the AXD 301 ATM switch, which reportedly delivers at or better than “nine nines” (99.9999999%) of uptime to customers. Since when we set out to build a database for applications requiring extremely high availability, Erlang was a natural fit.
We knew that Erlang’s supervisor concept, enabling a “let it crash” program designed for resilience, would be a big help for making systems that handle unforeseen errors gracefully. We knew that lightweight processes and a “many-small-heaps” approach to garbage collection would make it easier to build systems not suffering from unpredictable pauses in production. Those features paid off exactly as expected, and helped us a great deal. Many other features that we didn’t understand the full importance of at the time (such as the ability to inspect and modify a live system at run-time with almost no planning or cost) have also helped us greatly in making systems that our users and customers trust with their most critical data.
It turns out that our assessment of the key trade-off — a more limited pool of talented engineers — is, in practice, not a problem for a company like Basho. We need to hire great software developers, and we tend to look for ones with particular skills in areas like databases and/or distributed systems. If someone is a skilled programmer in relatively arcane disciplines like those, then the ability to learn a new programming language will not be daunting. While it’s theoretically a nice bonus for someone to bring knowledge of all the tools we use, we’ve hired a significant number of engineers that had no prior Erlang experience and they’ve worked out well.
This same purported drawback is a benefit in some ways. By not just looking for “X Engineers” (where X is Java, Erlang, or anything else), we make a statement both about our own technology decision-making process and the expected levels of interesting work at Basho. To help me work on my house, I’d rather have someone who self-identifies as an “expert carpenter” or “expert plumber,” not “expert hammer wielder,” even in the cases where most of the job might involve that tool. We expect developers at Basho to exercise deep, broad interests and expertise, and for them to do highly creative work. When we mention Erlang and the other thoughtful decisions we made in building our products, they value the roadmap and leadership.
I had an entertaining and ironic conversation about this recently with a manager at a large database company. He explained to me that we had clearly made the wrong choice, and that we should have chosen Java (like his team) in order to expand the recruiting pool. Then, without breaking a stride, he asked if I could send any candidates his way, to fill his gaps in finding talented people.
We continue to grow and to bring on great new engineers.
That’s not to say that there are no downsides. Any language, runtime, and community will bring with it different constraints and freedoms, making some tasks easier and others less so. We’ve done some work over the years to participate in the highly supportive Erlang community. But the big organizational weakness that so many people thought would come with the choice? It’s simply not a problem.
That lesson, combined with the ongoing technical advantages we enjoy because of Erlang, makes it easy to answer the question:
Yes, we would absolutely choose Erlang today.
December 3, 2012
In 2009, mobile marketing and advertising technology provider Velti had a good (but challenging) problem on their hands. Their technology, which allows people to interact with their TV by voting, giving feedback, participating in contests, etc., had taken off. It had been adopted by nearly all of the TV broadcasters in the UK and three of the UK’s five mobile operators. As more customers began using their technology, Velti saw quick growth in (inherently spikey) traffic. Their 2003-era .NET, SQLServer platform was becoming a concern.
Because the team at Velti had been working with Erlang (what Riak is written in), in 2010 they brought in Erlang Solutions to help them architect their next generation platform. Riak was chosen for the database, and an early version of Multi-Data Center replication in Riak Enterprise was used to build two geographically separated sites to minimize potential catastrophic outages.
Velti’s new mGageTM platform is now running on 18 servers across two data centers (nine nodes in each data center), with each server running both Erlang applications as well as Riak. We’re pleased to pass along reports that the platform is redundant, queue behavior has significantly improved (especially for large queue populations), and that after Velti moved to Riak 1.2, they saw noticeable disk space utilization thanks to improvements in merge management.
Markus Kern, VP Technology at Velti summarizes, “We operate a 24/7 service for over 140 customers. We cannot afford a single minute of downtime. Riak gives us the ability to meet and exceed our requirements for scale, data, durability, and availability.” Woot!
For more details on Velti’s experience, see our case study Highly Available Mobile Platform With Riak.
August 29, 2012
tl;dr – There will be no shortage of language-specific content at RICON when it comes to building Riak-backed applications. If you and your team working on a Riak application and have specific questions or needs around your language or framework of choice, you should be at RICON. Register here. The early bird price ends this Friday.
We are billing RICON as a “distributed systems conference dedicated to developers.” We mean this in two ways:
- We are raising awareness and strengthening a community around what it takes to build “distributed systems”; in which a set of physical resources that are spread over unpredictable networks cooperate to run a service in production with little or no downtime. Riak is one of a wide set of technologies that make this possible.
- We are delivering on a promise to simplify how developers interact with distributed systems at the language level. This is largely focused on Riak, but not entirely.
A brief look at the RICON schedule will make it quickly apparent that there is plenty of bonafide distributed systems knowledge and experience to go around. What may not be completely obvious (as was pointed out to me a few days ago by a prospective attendee and trusted advisor) is the depth of language-specific knowledge and experience that is represented in RICON’s schedule. I wanted to make sure we cleared this up.
For those of you interested what it takes to build applications with Riak (at the language level), here are the details of what will be represented in the talks. (Keep in mind that the listed speakers constitute but a tiny subset of knowledge that will be present.)
Java and the JVM
- Comcast contributed the first ever Riak Java client some time around the beginning of 2010. Though that code has changed immensely over the past three years, Riak has spread to various teams who are now using it in production, mostly with Java on the front-end. Michael Bevilacqua-Linn’s Big Data in the Small talk will give valuable insight on how to build JVM-based services that talk to Riak.
- George Reese’s Migrating from MySQL to Riak session will highlight their work using the Java-based Dasein persistence framework alongside Riak.
- Brian Roach and Russell Brown, primary maintainers of the Java client, will be wandering the crowd. There will also be several community members using Riak in production with Clojure and Scala that have experience to share.
- Riak is written in Erlang. And it follows OTP principles in that it’s composed of various Erlang applications and extensions like riak kv and riak_core. To that end, Bryan Fink’s talk on Riak Pipe, Ryan Zezeski’s Riak and Solr session, and a few other talks from the Basho Team will highlight how to build Erlang applications with Riak.
- OpenX is using riak_core to do all sorts of crazy, amazing things. Anthony Molinaro’s talk about how he and his team are serving trillions of ads per year will go deep on building Erlang services with Riak.
- Gary Flake is giving Day Two’s opening keynote. He and his team at Clipboard have put Riak through its paces and built a social network fronted by Node.js. He will have much advice and wisdom to pass along.
- Matt Ranney and Voxer operate one of the biggest Node.js applications known to man. They recently open-sourced their Riak node.js client and, along with real-world experience about running Riak clusters that are creeping towards petabytes of data, his talk will be invaluable to anyone building an application with Riak and Node.
Ruby and Rails
- The (not-yet-announced) talk from Ines Sombra and Michael Brodhead of EngineYard will include a non-trivial amount of Riak and Ruby production knowledge.
- Sean Cribbs, original author of Riak’s Ruby client, will be on-hand, along with a handful of community members who have Ruby/Rails applications in production.
- The team at Bump is full of talent, and they are steeped in Python experience. The first application they wrote when they switched from MongoDB to Riak was Python-based, and their talk about building a transaction log on Riak will touch on their Python usage, too.
- Various community members who have contributed to and use the Riak Python Client will be in attendance, ready to answer questions and debate implementation details.
- Bump’s talk will be valuable to Haskell fans, too, as they will be detailing using Riak with a custom, open-source Haskell proxy that handles client-side resolution.
- There are a few other known applications running Riak with Haskell in the wild. They, too, will be represented among the crowd.
- In addition to being Riak Core experts, OpenX wrote a custom C backend for Riak that will be highlighted in their talk.
- Andy Gross, primary author of the still-beta Riak C Client, will be at RICON and is expecting to share his plan for the future with would-be contributors.
What Other Languages Enthusiasts Should Attend?
Just because there isn’t a “Building a Blog with Riak and OCaml” talk on the schedule doesn’t mean that fans of OCaml should shy away from RICON. (In fact, Dave Parfitt has been hacking on an OCaml client and I’m sure he would love your input.) Fans of languages like Perl, Clojure, Go, and Smalltalk are encouraged to join. I have no doubt that you’ll leave feeling more confident about building applications that scale in your specific domain (and as I’ve said before we’ll happily refund your admission price if leave RICON feeling less-than-enriched.)
It’s also worth noting that, along with the massive power of the 100s of non-Basho attendees, nearly every member of the Basho Team that writes code – Engineering, Developer Advocates, Architects, Evangelists – will be at RICON as both eager onlookers and Riak authorities.
Join us for RICON. We’re looking forward to seeing you in October.
July 9, 2012
Bryan’s paper, “Experience Report: Distributed Computation on Dynamo-style Distributed Storage: Riak Pipe”, details the design and internals of Riak Pipe, the distributed processing framework that forms the foundation for Riak’s MapReduce engine. Bryan is the primary author of Riak Pipe.
Joseph’s submission, “Concurrent Property-based Testing: From Prototype to Final Implementation”, is based on the work that he and the team did (and continue to do) to test and bullet-proof the resiliency of Riak. (He gave a related talk at Erlang Factory this past March.)
They will both be part of the Workshop happening September 14th in Copenhagen, Denmark.
Congratulations to Bryan and Joseph!
March 26, 2012
This is a big week for Basho.
The first three days of Erlang Factory are primarily workshops, and Daniel Reverri will be teaching a 3 day class on Building Distributed Clusters with Riak. All attendees will walk away with a clear understanding of exactly why Riak is the best distributed database you will ever run in production.>
The actual conference spans Thursday – Friday, and the talk lineup for this year’s event is exceptional. The Basho team will be well-represented. Put these talks on your calendar if you’re attending:
- Test-First Construction of Distributed Systems – Joseph Blomstedt
- Building Healthy Distributed Systems – Mark Phillips
- Building Cloud Storage Services with Riak – Andy Gross
Several members of the Riak Community are also on the schedule:
- Erlang for .NET Developers – OJ Reeves
- Rewriting GitHub Pages with Riak Core, Riak KV, and Webmachine – Jesse Newland
Basho Bash West
We’re really excited about all the success surrounding Riak in 2011 and we’re continuously building on that momentum as we move deeper into 2012. The number of Riak users and community members are growing exponentially so we decided to throw a party to celebrate. We’re calling it Basho Bash West 2012, and it’s co-sponsored by our friends at Joyent, Yammer and Voxer.
Come join us on Thursday, March 29th, at 6:30PM. We are renting out Roe, and you won’t be allowed to pay for anything. You’ll also be leaving with some limited edition Riak swag that will make you the envy of all your friends. Various members of the Basho team will be in attendance, along with hundreds of developers, executives, and technology enthusiasts from the Bay Area. Miss this at your peril.
You must RSVP to attend.
December 21, 2011
The inaugural BashoChats was held just under a week ago at BashoWest in San Francisco. About 30 local developers came out to have a few beers on Basho’s tab and discuss distributed systems and databases. If you’re local to the Bay Area and/or want to keep an eye on what we have planned, join the group. There are some great talks in the pipeline…
Most importantly I’m happy to report that both talks from the evening are now online for your viewing pleasure.
Enjoy. Hope to see you next month.
DTrace and the Erlang VM
Andy Gross opened up the evening with just under 30 minutes on the current work happening at Basho and a few other companies to bring DTrace to Erlang VM. He starts off with some general information on both components and then goes in-depth on how they can be used to profile a running Riak installation.
Repo here on GitHub with the code he used for the examples in his presentation.
Computing Reach Using Storm Distributed RPC
After Andy concluded, Nathan Marz gave an overview of Storm, a framework he and his team at BackType built for distributed and fault tolerant realtime computation. He takes us through some Storm basics and then demonstrates how it is used to compute reach using distributed RPC.