April 23, 2014
Traditional database architectures were the default option for many pre-Internet use cases and architectures, such as MySQL, remain common today. However, these traditional solutions have limits that quickly become apparent as companies (and data) grow. Modern companies have changing priorities: downtime (planned or unplanned) is never acceptable; customers require a fast and unified experience; and data of all types is growing at unimaginable rates. Solutions such as Riak are designed to handle these shifting priorities.
Top Reasons to Move to Riak
- Zero Downtime: Distributed NoSQL solutions like Riak are designed for always-on availability. This means data is always read/write accessible and the system never goes down. Downtime, planned or unplanned, can make or break a customer experience.
- Ease-of-Scale: Traffic can be unpredictable. Businesses need to scale up quickly to handle peak loads during holidays or major releases, but then need to scale back down to save money. Riak makes it easy to add and remove any number of nodes as needed and automatically redistributes data across the cluster. Scaling up or down never needs to be a burden again.
- Flexible Data Model: From user generated data to machine-to-machine (M2M) activity, unstructured data is now commonplace. Riak can store any type of data easily with its simple key/value architecture.
- Global Data Locality: Every company is a global company and needs to provide consistent, low-latency experiences to everyone, regardless of physical location. Riak’s multi-datacenter replication makes it easy to set up datacenters wherever users are, for both geo-data locality and maintaining active backups.
Users That Switched to Riak
Many top companies have already moved from relational architectures to Riak. Here’s a look at a few that have made the switch.
Bump (acquired by Google)
Bump, acquired by Google in 2013, allows users to share contact information and photos by bumping two phones together. Bump uses Riak to store almost all of its user data: contacts, communications sent and received, handset information, social network OAuth tokens, etc. Bump moved from MySQL to Riak due to its operational qualities: “No longer will we have to do any master/slave song and dance, nor will we fret about performance, capacity, or scalability; if we need more, we’ll just add nodes to the cluster.” Learn more about their move in their case study.
Alert Logic helps companies defend against security threats and address compliance mandates, such as PCI and HIPAA. Alert Logic switched from MySQl to Riak to collect and process machine data and to perform real-time analytics, detect anomalies, ensure compliance, and proactively respond to threats. Alert Logic processes nearly 5TB/day in Riak and has achieved performance results of up to 35k operations/second. Learn more about how Alert Logic improved performance through Riak in our blog post.
The Weather Company
The Weather Company provides millions of people every day with the world’s best weather forecasts, content and data, connecting with them through television, online, mobile and tablet screens. Riak is central to The Weather Company’s weather data services platform that delivers real-time weather services to aerospace, insurance, energy, retail, media, government, and hospitality industries. Check out our blog to see why The Weather Company selected Riak over MySQL to support their massive big data needs.
Dell uses Riak as the core distributed database technology underlying its customer cloud management solutions. Riak is used to collect and manage data associated with customer application provisioning and scaling, application configuration management, usage governance, and cloud utilization monitoring. In 2012, Enstratius (acquired by Dell) switched to Riak from MySQL in order to provide cross-datacenter redundancy, high write availability, and fault tolerance. Check out the full Enstratius case study.
Data Modeling in Riak
Riak has a “schemaless” design. Objects are comprised of key/value pairs, which are stored in flat namespaces called buckets. Below is a chart with some simple approaches to building common application types with a key/value model.
|Session||User/Session ID||Session Data|
|Advertising||Campaign ID||Ad Content|
|Sensor||Date, Date/Time||Sensor Updates|
|User Data||Login, eMail, UUID||User Attributes|
|Content||Title, Integer||Text, JSON/XML/HTML Document, Images, etc.|
January 13, 2014
RICON West 2013, Basho’s developers conference, featured two tracks over two days. However, we did bring everyone together for the keynote speakers. Pat Helland (Salesforce.com), Justin Sheehy (Basho), and Jeff Dean (Google) keynoted this past RICON.
To kickoff the beginning of Day Two, Basho’s CTO, Justin Sheehy, spoke on “Maximum Viable Product.” His talk examines how software is created and his passion to create the types of things that people will care about long after the fact. Drawing analogies from art, architecture, the military, and more, he explains why building it faster is not always better and how it is vital to develop the basics so that it can be better in the long run. You can watch his keynote below.
At the end of RICON West, we closed with Google Fellow, Jeff Dean. His talk, “The Tail at Scale: Achieving Rapid Response Times in Large Online Services,” describes a collection of techniques and practices to lower response times in large distributed systems whose components run on shared clusters of machines, where pieces of these systems are subject to interference by other tasks, and where unpredictable latency hiccups are the norm, not the exception. Some of the techniques adapt to trends observed over periods of a few minutes, making them effective at dealing with longer-lived interference or resource contention. Others react to latency anomalies within a few milliseconds, making them suitable for mitigating variability within the context of a single interactive request. He also shares examples of how these techniques are used in various pieces of Google’s systems infrastructure and in various higher-level online services. You can watch his closing keynote below.
To watch all of the sessions from RICON West 2013, visit the Basho Technologies Youtube Channel.
November 25, 2013
RICON is Basho’s distributed systems conference series for developers. Last month, RICON West took place at the St. Regis in San Francisco and brought together hundreds of engineers, developers, scientists, and architects to discuss the theory, practice, and importance of running distributed systems in production. It was a packed two-days with 26 talks from Basho, Google, Netflix, Seagate, The Weather Company, Twitter, and many others.
- Lindsey Kuper
- Miles O’Connell
- Eric Redmond
- James Hughes
- Justin Shoffstall and Charlie Voiselle
- Jeff Hodges
- Peter Bailis
- Ryland Degnan
- Joseph Blomstedt
- Derek Murray
All talks from Day Two will be added to the playlist over the next few weeks. Be sure to subscribe to Youtube.com/BashoTechnologies for notifications about new RICON West videos and other Basho content.
October 31, 2013
If you attended RICON West, we’d love to hear your feedback! Please fill out the survey here.
Thanks to everyone who helped make the sold-out RICON West a huge success! RICON has come a long way in just one year and we are excited to see how it grows and evolves in the future.
RICON West featured over 25 speakers from academia and industry, including speakers from Basho, Google, Microsoft Research, Netflix, Salesforce, Seagate, The Weather Company, and Twitter. Over two days, they discussed the theory, practice, and importance of running distributed systems in production as well as some predictions on what’s in store for the future. Over the next few weeks, we will be posting slides and videos from all of the talks on both ricon.io and the blog.
In case you missed it, we also received some great press. Here’s a quick recap:
- “Salesforce’s data-center design: ‘Go for web scale, and build it out of s**t!’“
- “What do we want? Strong consistency! When do we… oh, it’s in Riak v2“
What’s next for RICON? With three conferences already under our belt, we are excited to get to work on RICON Europe (our first international conference!) and continue RICON East. Keep an eye on Ricon.io and our blog for more details.
RICON West was also paired with a one-day Riak training. We plan on making these a more regular occurrence all over the country.
October 7, 2013
RICON West, Basho’s distributed systems conference, is quickly approaching at the end of October. This event will feature speakers from both academia and industry, presenting on a wide variety of distributed systems topics. This installment of RICON will be the largest to date and it would not be possible without our amazing sponsors.
Similar to the RICON speaker lineup, the sponsors stem from a variety of different industries. Current sponsors include Seagate, Engine Yard, Yammer, Google, SoftLayer, and Tower3. Additionally, this year RICON has its first media sponsorship from The Register. The Register’s Jack Clark has put together a list of the sessions that he’s most excited about attending in his article, “Distributed Systems Boffins Flock to RICON West.”
RICON West will be at the St. Regis in San Francisco from October 29-30th. In addition to the conference, Basho will be hosting a one-day Riak training the day before (October 28th). Be sure and grab tickets to both before they sell out!
October 1, 2013
On October 29-30th, RICON West will take over the St. Regis in San Francisco. RICON is Basho’s distributed systems conference that brings together engineers, developers, scientists, and architects. You can purchase tickets for this almost sold-out event here: ricon-west-2013.eventbrite.com/
This year’s keynote speaker is Jeff Dean, Google Fellow at Google Inc. His talk entitled, “The Tail at Scale: Achieving Rapid Response Times in Large Online Services,” will describe a collection of techniques and practices that lower response times in large distributed systems whose components run on shared clusters of machines, where pieces of these systems are subject to interference by other tasks, and where unpredictable latency hiccups are the norm, not the exception. He will also share examples of how these techniques are used in various pieces of Google’s systems infrastructure and in various higher-level online services.
RICON West also features speakers from academia and industry, including: Peter Bailis (UC Berkeley), Justin Sheehy (Basho), Pat Helland (Salesforce.com), Jeff Hodges (Twitter), Diego Ongaro (Stanford University), Susan Potter (Finsignia), Ryland Degnan and Jason Brown (Netflix), Miles O’Connell (StackMob), Derek Murray (Microsoft), Raja Selvaraj and Arvinda Gillella (The Weather Company), and many others.
If you’ll be in San Francisco on Oct. 28th, we will also be hosting a full-day Riak training. This training will teach you everything you need to know to start building highly available, scalable systems on Riak. Tickets to both the training and RICON are still available.
Be sure to grab tickets to RICON West before they sell out and see you in San Francisco!
August 27, 2013
If you still haven’t gotten your ticket to RICON West, make sure to grab one before the early bird sale ends on August 29th. RICON West is Basho’s distributed systems conference and will take place in San Francisco on October 29-30th.
RICON West will feature speakers that are using and researching distributed systems to solve a wide range of problems. Some highlights include:
- Jeff Dean, Google Fellow at Google
- Pat Helland, Architect at Salesforce.com
- Jeff Hodges, Distributed Systems Engineer at Twitter
- Michael Bernstein, Software Developer at Paperless Post
- Susan Potter, Lead Software Engineer at Finsignia
- Ryland Degnan and Jason Brown, both Senior Software Engineers at Netflix
- Derek Murray, Researcher at Microsoft Research
- Raja Selvaraj, Data Systems Engineering Manager, and Arvinda Gillella, SUN Architect, at The Weather Company
We will be also hosting a Riak training on October 28th, right before the conference. During this training, you’ll learn about the core principles behind Riak and how it manages to scale both performance and capacity while evenly distributing data throughout the cluster. At the end of the day, you’ll be able to create and deploy your own cluster, as well as be familiar with query patterns, data modeling, and running Riak in production.
May 23, 2013
Thank you to all who attended or tuned into the live-stream for RICON East this past week. We hope you had fun and learned a thing or two; we sure did.
We’ll be publishing the videos from RICON East over the coming weeks. Keep an eye on ricon.io/archive/2013/east.html for updates.
While RICON East may have just ended, we’re already busy working on RICON West.
RICON West will take place October 29-30 in San Francisco at the St. Regis Hotel. This will be our largest conference to date and we hope you’ll join us once again.
Tickets are on sale now, with early bird discounts through August. Each attendee will also get a personalized conference track jacket and a ticket to the after party at Twenty Five Lusk.
We even have a few speakers lined up already! Jeff Dean, Google Fellow at Google; Kate Matsudaira, Founder and CTO of Pop Forms; and Peter Bailis, PhD student at UC Berkeley will be speaking about their work with distributed systems, alongside Basho engineers.
If you’d like to present at RICON West, email email@example.com to submit a talk. We are accepting proposals through July 1st about anything distributed systems-related.
For more details, head on over to ricon.io/west.html
See you all in San Francisco in October.
October 30, 2012
Google’s LevelDB has proven very versatile within Riak — LevelDB is implemented in Riak as eleveldb, an Erlang wrapper around levelDB. But Google’s target environment was much different than the large data environment of Riak’s users. Riak 1.2 contains the first wave of performance tuning for large data. These changes improve overall throughput and eliminate most instances where levelDB would hang for a few seconds trying to catch up. The new release also contains a fix for an infinite loop compaction condition, a bloom filter that greatly reduces time searching for non-existent keys, and several bug fixes. This blog details these improvements and also gives some internal benchmark results obtained using basho_bench, Basho’s open source benchmarking tool.
- Stalls: In Riak 1.1, individual vnodes in Riak (one levelDB database) could have long pauses before responding to individual get / put calls. Several stall sources were identified and corrected. On a test server, LevelDB in 1.1 saw stalls of 10 to 90 seconds every 3 to 5 minutes. In Riak 1.2, levelDB sometimes sees one stall every 2 hours for 10 to 30 seconds.
- Throughput: While impacted by stalls, throughput is an independent code and tuning issue. The fundamental change made was to increase all on-disk file sizes to minimize the number of file opens and reduce the number of compactions. LevelDB in Riak 1.1 had a throughput of ~400 operations per second on a given server. These changes raised throughput to ~2,000 operations per second.
- Infinite loop during compaction: In 1.1, the background compaction would get caught in an infinite loop if it encountered a file with a corrupt data block. The previous solution was to stop the node, manually execute “recovery”, then restart the node. The entire file (and all its data) was removed from the data store and copied to the “lost” directory during the recovery. Riak 1.2 creates one file, BLOCKS.bad, in the “lost” directory. The levelDB code then automatically removes the corrupted block from compaction processing and copies it to this file. The compaction then continues processing the remaining data in the file (and moves along without going into an infinite loop).
- Merge of levelDB bloom filter code: Google has created a bloom filter to help levelDB more quickly identify keys that do not exist in the data store. The bloom filter code is merged into this release. While incrementally beneficial in its own right, the bloom filter enables changes to the file / level structure which dramatically improves overall throughput.
- app.config eleveldb options: in Riak 1.1, most parameters set in app.config for the levelDB layer were never passed. This is corrected. Users should assume previous parameter tests / experiments to be invalid.
The graphs below illustrate levelDB’s improvements in throughput and maximum latency. Test data was obtained using basho_bench, Basho’s open source benchmarking tool. Raw data and configuration files can be downloaded here. In the benchmark presented, levelDB preloads a database with 10 million sequentially ordered keys.
As can be seen, levelDB 1.1 stalls regularly, whereas 1.2 seldom stalls due to stall management improvements. We can also see that levelDB in 1.2 has a higher ingest rate (we were able to input 10 million records in 44 minutes compared to 106 minutes in 1.1)
Throughput in levelDB 1.1
Throughput in levelDB 1.2
Maximum latency in levelDB 1.1
Maximum latency in levelDB 1.2
We have already identified further performance tuning for future work, including bloom filter modification and removing redundancy (bloat) during memory to level-0 file creation. Expect another wave of performance tuning in subsequent point releases and major releases.
- Data backup: Theoretically there is no need to perform data backup on levelDB since Riak duplicates all data across several nodes. But many users have suggested they would still sleep better if there was a means to perform a direct backup by node/vnode anyway. Backups during live operation are a planned, next feature.
- Infinite loops: Riak 1.2 contains fixes for a couple of the most common cases where compactions could enter infinite loops when the state of files on the disk does not match that of LevelDB’s internal state. However, there are still other, less frequent cases that can still cause infinite loops. These less frequent cases are high on the future work list.
- Error correction: LevelDB has methods to repair and restore damaged vnodes. The time cost of executing a repair can be huge. The repair time is already better with release 1.2 (in one case the time was reduced from 6 weeks … really … to eleven hours). We already have a design waiting for programming resources that will further speed repair time.