September 5, 2013
Flywheel is a mobile taxi hailing platform that is in more than half of the cabs in San Francisco and has recently expanded to LA. Through the mobile application, users can request cabs, view their location, and pay via their smartphone. Flywheel is currently using Riak for their underlying passenger and driver engine. This engine stores information such as user accounts, passenger information, taxi information, and ride details. Riak also stores real-time production data, such as passenger ride requests and ride accepts from drivers.
Part of the future growth strategy for Flywheel was shifting to a purely Linux and open source based infrastructure. This meant moving away from a more traditional closed source relational database system. They needed something that was easy to get up and running and that didn’t require a lot of developer resources to manage. Flywheel evaluated a number of different open source choices, including Redis, MongoDB, Cassandra, and CouchDB. Ultimately, they decided to move to Riak and supplement it with Postgres, as Riak offered the most operational simplicity.
Flywheel went into production with Riak in September of 2012. They are currently running eight nodes in their cluster and handle 25,000-30,000 writes and 50,000-60,000 reads each day. Riak’s key/value data model has been a natural fit for the application’s “events” that happen each time a taxi ride is processed. These events include taxi hails, driver response, taxi rides, ride payments, etc. and buckets are used to group them together within Riak.
SAN FRANCISCO – SEPTEMBER 5, 2013 — Flywheel, a mobile taxi hailing platform, works with Basho’s distributed NoSQL database, Riak, to power their passenger and driver engine. During Bay to Breakers 2013, San Francisco’s popular marathon, Flywheel saw their highest traffic spikes due to the hundreds of thousands of people trying to get around the city. While their application was experiencing peak loads, Riak ensured that this concurrent usage had no impact on the end-user experience and seamlessly handled this traffic at low-latency with zero downtime.
Flywheel is in more than half of the cabs in San Francisco and has recently expanded to the LA area. Due to their quick growth and need for operational simplicity, Flywheel decided to move their platform to Riak in September of 2012. Riak provided the scale and ease-of use necessary for Flywheel’s small team and beat out many alternative NoSQL databases.
“Bay to Breakers was an important time for us to solidify our place in the mobile taxi market,” said Cuyler Jones, Chief Architect at Flywheel. “Part of why we moved to Riak was to leverage its high availability and scalability, which it achieved perfectly. It was great to have one less thing to worry about during this key event.”
Flywheel’s passenger and driver engine stores information such as user accounts, passenger information, taxi information, and ride details. In addition, Riak is also used to store real-time production data, such as passenger ride requests and ride accepts from drivers.
Basho’s Riak is an open source distributed database is designed for always-on availability, fault-tolerance, scalability, and ease-of-use. It is used by companies worldwide that need to always store and access critical data. Mobile is one of the most common use cases for Riak, due to the high availability and low-latency requirements, as well as the need to scale quickly to meet peak loads. For a look at how others use Riak to solve the challenges of mobile applications and services, visit the Mobile Spotlight page.
Flywheel is currently running eight nodes in their Riak cluster and handle 25,000-30,000 writes and 50,000-60,000 reads each day. For more information about Flywheel, check out their site and download their app. To learn more about Riak, visit basho.com/riak/.
Basho is a distributed systems company dedicated to making software that is highly available, fault-tolerant and easy-to-operate at scale. Basho’s distributed database, Riak and Basho’s cloud storage software, Riak CS, are used by fast growing Web businesses and by over 25 percent of the Fortune 50 to power their critical Web, mobile and social applications and their public and private cloud platforms.
Riak and Riak CS are available open source. Riak Enterprise and Riak CS Enterprise offer enhanced multi-datacenter replication and 24×7 Basho support. For more information, visit basho.com. Basho is headquartered in Cambridge, Massachusetts and has offices in London, San Francisco, Tokyo and Washington DC.
With offices in San Francisco and Redwood City, Flywheel Software, Inc was founded in 2010 to provide an all-new experience to both passengers and drivers of for-hire vehicles. The Flywheel service includes a mobile app whereby its customers order taxi rides in real-time, track arrival via GPS and automatically pay their fare–all from their smart phone device.
Flywheel’s Investors include Craton Equity Partners of Los Angeles, Shasta Ventures, RockPort Capital, Sand Hill Angels and members of the Band of Angels. Flywheel can be found at www.flywheel.com
September 4, 2013
For more background on the indexing techniques described, check out our blog “Index for Fun and for Profit“
The War Against Zombies is Still Raging!
In the United States, the CDC has recovered 1 million Acute Zombilepsy victims and has asked for our help loading the data into a Riak cluster for analysis and ground team support.
Know the Zombies, Know Thyself
The future of the world rests in a CSV file with the following fields:
- Full Name
- Zip Code
- National ID
- Feet Inches
For each record, we’ll serialize this CSV document into JSON and use the National ID as the Key. Our ground teams need the ability to find concentrations of recovered zombie victims using a map so we’ll be using the Zip Code as an index value for quick lookup. Additionally, we want to enable a geospatial lookup for zombies so we’ll also GeoHash the latitude and longitude, truncate the hash to four characters for approximate area lookup, and use that as an index term. We’ll use the G-Set Term-Based Inverted Indexes that we created since the dataset will be exclusively for read operations once the dataset has been loaded. We’ve hosted this project at Github so that, in the event we’re over taken by zombies, our work can continue.
In our load script, we read the text file and create new zombies, add Indexes, then store the record:
Our Zombie model contains the code for serialization and adding the indexes to the object:
Let’s run some quick tests against the Riak HTTP interface to verify that zombie data exists.
First let’s query for a known zombilepsy victim:
curl -v http://127.0.0.1:8098/buckets/zombies/keys/427-69-8179
Next, let’s query the inverted index that we created. If the index has not been merged, then a list of siblings will be displayed:
Zip Code for Jackson, MS:
curl -v -H "Accept: multipart/mixed" http://127.0.0.1:8098/buckets/zip_inv/keys/39201
GeoHash for Washington DC:
curl -v -H "Accept: multipart/mixed" http://127.0.0.1:8098/buckets/geohash_inv/keys/dqcj
Excellent. Now we just have to get this information in the hands of our field team. We’ve created a basic application which will allow our user to search by Zip Code or by clicking on the map. When the user clicks on the map, the server converts the latitude/longitude pair into a GeoHash and uses that to query the inverted index.
Colocation and Riak MDC will Zombie-Proof your application
First we’ll create small Sinatra application with the two endpoints required to search for zip code and latitude/longitude:
Our zombie model does the work to retrieve the indexes and build the result set:
Saving the world, one UI at a time
Searching for zombies in the Zip Code 39201 yields the following:
Clicking on Downtown New York confirms your fears and suspicions:
The geographic bounding inherent to GeoHashes is obvious in a point-dense area so, in this case, it would be best to query the adjacent GeoHashes.
Keep Fighting the Good Fight!
There is plenty left to do in our battle against zombies!
- Create a Zombie Sighting Report System so the concentration of live zombies in an area can quickly be determined based on the count and last report date.
- Add a crowdsourced Inanimate Zombie Reporting System so that members of the non-zombie population can report inanimate zombies. Incorporate Baysian filtering to prevent false reporting by zombies. They kind of just mash on the keyboard so this shouldn’t be too difficult.
- Add a correlation feature, utilizing Graph CRDTs, so we can find our way back to Patient Zero.
August 28, 2013
Customer.io is passionate about helping their customers grow happy customers. Their focus is on creating genuine, relevant interactions for their customers. Of course, happy customers expect great performance. As Customer.io continues to rapidly grow, they are putting in place the foundation to deliver on those commitments.
Yesterday, Customer.io announced that they upgraded their architecture – moving from MongoDB to Riak. As described in their recent blog post, the move to Riak has provided an immediate and dramatic performance boost. Some performance highlights include:
- User segmentation can run anywhere from 6x faster (raw performance) to 100x faster, taking into account that customer requests are now parallelizable. (To send more relevant, timely emails, Customer.io enables subsets of people to be grouped around similar characteristics.)
- Processing time was reduced from 3 hrs to 30 minutes on a large segment.
- Customer.io launched a new feature that shows percentage complete.
In addition from gaining the inherent benefits from Riak as a scalable, distributed system, Customer.io also implemented Go, an increasingly popular programming language. Go adds powerful message queuing, systems programming, and exceptional concurrency.
You can view the entire blog post from Customer.io here: customer.io/blog/Segment-customer-data-faster.html
August 28, 2013
What is an Index?
In Riak, the fastest way to access your data is by its key.
However, it’s often useful to be able to locate objects by some other value, such as a named collection of users. Let’s say that we have a user object stored under its username as the key (e.g.,
thevegan3000) and that this particular user is in the
Administrators group. If you wanted to be able to find all users, such as
thevegan3000 who are in the Administrators group, then you would add an index (let’s say,
user_group) and set it to
administrator for those users. Riak has a super-easy-to-use option called Secondary Indexes that allows you to do exactly this and it’s available when you use either the LevelDB or Memory backends.
Using Secondary Indexes
Secondary Indexes are available in the Riak APIs and all of the official Riak clients. Note that
user_group_bin when accessing the API because we’re storing a binary value (in most cases, a string).
Add and retrieve an index in the Ruby Client:
In the Python Client:
In the Java Client:
More Example Use Cases
Not only are indexes easy to use, they’re extremely useful:
- Reference all orders belonging to a customer
- Save the users who liked something or the things that a user liked
- Tag content in a Content Management System (CMS)
- Store a GeoHash of a specific length for fast geographic lookup/filtering without expensive Geospatial operations
- Time-series data where all observations collected within a time-frame are referenced in a particular index
What If I Can’t Use Secondary Indexes?
Indexing is great, but if you want to use the Bitcask backend or if Secondary Indexes aren’t performant enough, there are alternatives.
A G-Set Term-Based Inverted Index has the following benefits over a Secondary Index:
- Better read performance at the sacrifice of some write performance
- Less resource intensive for the Riak cluster
- Excellent resistance to cluster partition since CRDTs have defined sibling merge behavior
- Can be implemented on any Riak backend including Bitcask, Memory, and of course LevelDB
- Tunable via read and write parameters to improve performance
- Ideal when the exact index term is known
Implementation of a G-Set Term-Based Inverted Index
A G-Set CRDT (Grow Only Set Convergent/Commutative Replicated Data Type) is a thin abstraction on the Set data type (available in most language standard libraries). It has a defined method for merging conflicting values (i.e. Riak siblings), namely a union of the two underlying Sets. In Riak, the G-Set becomes the value that we store in our Riak cluster in a bucket, and it holds a collection of keys to the objects we’re indexing (such as
thevegan3000). The key that references this G-Set is the term that we’re indexing,
administrator. The bucket containing the serialized G-Sets accepts Riak siblings (potentially conflicting values) which are resolved when the index is read. Resolving the indexes involves merging the sibling G-Sets which means that keys cannot be removed from this index, hence the name: “Grow Only”.
administrator G-Set Values prior to merging, represented by sibling values in Riak
administrator G-Set Value post merge, represented by a resolved value in Riak
Great! Show me the code!
As a demonstration, we integrated this logic into a branch of the Riak Ruby Client. As mentioned before, since a G-Set is actually a very simple construct and Riak siblings are perfect to support the convergent properties of CRDTs, the implementation of a G-Set Term-Based Inverted Index is nearly trivial.
There’s a basic interface that belongs to a Grow Only Set in addition to some basic JSON serialization facilities (not shown):
Next there’s the actual implementation of the Inverted Index. The index put operation simply creates a serialized G-Set with the single index value into Riak, likely creating a sibling in the process.
The index get operation retrieves the index value. If there are siblings, it resolves them by merging the underlying G-Sets, as described above, and writes the resolved record back into Riak.
With the modified Ruby client, adding a Term-Based Inverted Index is just as easy as a Secondary Index. Instead of using
_bin to indicate a string index and we’ll use
_inv for our Term-Based Inverted Index.
Binary Secondary Index:
zombie.indexes['zip_bin'] << data['ZipCode']
Term-Based Inverted Index:
zombie.indexes['zip_inv'] << data['ZipCode']
The downsides of G-Set Term-Based Inverted Indexes versus Secondary Indexes
- There is no way to remove keys from an index
- Storing a key/value pair with a Riak Secondary index takes about half the time as putting an object with a G-Set Term-Based Inverted Index because the G-Set index involves an additional Riak put operation for each index being added
- The Riak object which the index refers to has no knowledge of which indexes have been applied to it
- It is possible; however, to update the metadata for the Riak object when adding its key to the G-Set
- There is no option for searching on a range of values (e.g., all
See the Secondary Index documentation for more details.
The downsides of G-Set Term-Based Inverted Indexes versus Riak Search:
Riak Search is an alternative mechanism for searching for content when you don’t know which keys you want.
- No advanced searching: wildcards, boolean queries, range queries, grouping, etc
See the Riak Search documentation for more details.
Let’s see some graphs.
The graph below shows the average time to put an object with a single index and to retrieve a random index from the body of indexes that have already been written. The times include the client-side merging of index object siblings. It’s clear that although the put times for an object + G-Set Term-Based Inverted Index are roughly double than that of an object with a Secondary Index, the index retrieval times are less than half. This suggests that secondary indexes would be better for write-heavy loads but the G-Set Term-Based Inverted Indexes are much better where the ratio of reads is greater than the number of writes.
Over the length of the test, it is even clearer that G-Set Term-Based Inverted Indexes offer higher performance than Secondary Indexes when the workload of Riak skews toward reads. The use of G-Set Term-Based Inverted Indexes is very compelling even when you consider that the index merging is happening on the client-side and could be moved to the server for greater performance.
- Implement other CRDT Sets that support deletion
- Implement G-Set Term-Based Indexes as a Riak Core application so merges can run alongside the Riak cluster
- Implement strategies for handling large indexes such as term partitioning
August 21, 2013
Riak is being used by companies of all shapes and sizes. Since it is open source, we don’t know exactly how many deployments of Riak there are, but our best guess puts it in the thousands. We love hearing about all the unique ways companies are using Riak. Sometimes, our awesome users and partners even write up blogs that showcase how they’re using Riak and why they decided to give it a try. Below are a few good ones that we came across:
Our partner, SoftLayer, has written about Riak a number of times. Recently, Marc Jones wrote about some popular use cases for Riak and Harold Hannon wrote a performance analysis about running Riak on bare metal.
You can learn more about how Basho partners with SoftLayer here.
Flyclops is currently in the process of launching with Riak, and they have chosen to blog about their experiences along the way. First, they wrote about their database evaluation process and why they chose Riak. Then, earlier this week, they wrote about building Riak on OmniOS.
Superfeedr created a Google Reader API replacement and chose to power it with Riak.
For more examples of how companies are using Riak, check out our Users Page.
August 5, 2013
This month, Basho will be at Meetups and other events all over the world – presenting on all things Riak and distributed systems. Here are some of the highlights of where we’ll be in August:
Boston Meetup: Tonight, David Bishop (Lead Systems Administrator at Synacor) will discuss his experience using Riak Enterprise to power a cross-datacenter video metadata product behind some of the biggest ISP’s portals in the business.
DC Meetup: This Meetup will be focused on using different technologies to power mobile apps. It will feature case studies from Rovio and Zope Corporation. This event will take place on August 8th and you can register here.
Los Angeles Meetup: Learn about the new features and updates available with Riak 1.4, including eventually consistent counters. You can register for this August 13th event here.
Berlin Meetup: This eBay Meetup will feature two speakers, including Basho Chief Architect Andy Gross. Andy will speak about the resurgence in interest in both theoretical and applied distributed systems, explore new areas of promising research, and provide practical advice for dealing with systems in our new distributed world. You can register for this August 15th event here.
PuppetConf: Basho will be attending PuppetConf in San Francisco from August 22-23. Come visit our booth and meet our Director of Community, Mark Phillips and our Solutions Engineer, Pavan Venkatesh. Bring any Riak questions and get some great swag.
This is just a few of the events we’ll be at throughout August. For a complete list, check out the Events Page.
July 25, 2013
Hosted Graphite is an open-source, application metrics system that lets you measure, analyze, and visualize large amounts of data about your applications and backend systems, without worrying about setting up your own server and dealing with scaling, backups, or maintenance. They use Riak to store all of their metrics – a time series collection of name-value data.
Hosted Graphite was originally using Whisper, a fixed-size database, which stored their time series data as binaries on disk. However, its focus on simplicity meant that it didn’t offer replication or other helpful features. As their data set grew, they knew they’d need to switch to a system that could more easily distribute their data and scale effectively. Additionally, since there weren’t any plans to hire past the existing two-person ops team, they needed a system that provided always-on availability (as any failures are highly visible to their customers) and operational simplicity.
Based on their criteria, they were able to quickly rule out many other database options. When they came across Riak, it fit all of their requirements and looked operationally friendly, so they decided to try it. They were able to easily get Riak into production and have been live with Riak since June of 2012.
Hosted Graphite runs two Riak clusters and a total of nine nodes. They are currently storing 1.5 billion keys and see 60 GB of growth per day across their nodes. They use both the Bitcask and LevelDB backends.
As Charlie von Metzradt, co-founder of Hosted Graphite, said, “Launching with Riak has helped us sleep at night. We don’t need to worry when a node or two goes down, as we can just deal with it later. For a two person team, this has been invaluable.”
For more information on Hosted Graphite’s experiences with Riak, check out Charlie’s talk from a recent meetup.
You can also visit basho.com to see if Riak is the right fit for your data.
New York, NY– July 24, 2013 – Basho is a proud sponsor and exhibitor of DevCon5 2013, the HTML5 and mobile application developer conference. DevCon5 takes place July 24-25 in New York, NY.
DevCon5 is a conference where both front and back end developers are familiarized with disruptive technologies that enable UX/UI and back end mobile development. While this is Basho’s first year at DevCon, their distributed database, Riak, is already a key tool for backend mobile developers looking to provide “always on” user experiences. Due to Riak’s redundant, fault-tolerant design, Riak also provides a consistently fast mobile user experience that can easily scale and support highly concurrent access. It is currently used to power mobile applications like Voxer, Bump, and Rovio. For more information on how mobile applications and platforms can use Riak, download “Mobile on Riak: A Technical Introduction.”
In addition to sponsoring, Basho UX/UI Lead Designer, Sarah Drasner, will be speaking at DevCon5. Drasner’s talk, “CSS Animations to Tell A Story” will be a deep-dive into creating scalable web graphics for maximum impact, while telling a broader story of emerging tools that allow for less operational complexities and amplified impact. Like Basho’s flagship product Riak, Drasner will discuss how technologies are shifting to meet emerging business requirements, while minimizing immediate operational burdens and enabling ease of scale.
Drasner’s talk begins at 3:30pm ET on Wednesday, July 24th.
July 24, 2013
It is tempting, when considering documentation, to decide that it is “someone else’s problem.” In truth, writing and maintaining documentation is a cross-disciplinary function. Content is paramount, clarity and comprehension of design determines whether the content is accessible, and information architecture will expedite learning…or frustration.
At Basho, we are proud of our documentation. All design, updates, and edits are done in the open and we encourage community participation. Recently, we launched a major refresh to our docs and, in the interest of sharing our learnings with our community, we wanted to describe some of the ideas and principles behind it.
With this recent update, we were particularly interested in a clean and legible design. We wanted a redesign that was easy to read, easy to reference, and easy to reuse.
To that end, we updated the font set to include a serif and a sans serif (Gandhi and Open Sans, respectively). Our design team selected two open source types that worked well together, but also had the best cross-browser and cross-display consistency.
We made the text larger, changed its color to be black on white for higher contrast, and limited the width of the page for ease of reading (à la Matthew Butterick’s suggestions). This focus on legibility allows us to scale content within the same design theme as needed.
As we continue updating Riak, prior documentation remains relevant and accessible. Previously, the Riak version selection was displayed horizontally, with all major releases visible. We added a selection menu that flows vertically and now only indicates the currently-viewed product version.
The navigation has also been updated so the collapse behavior maintains state across pages and links to the Help Page and GitHub repository remain static.
To appeal to our audience of both developers and operators, we now have two distinct tracks of content that are highlighted and organized in the left-hand navigation menu. These tracks are bookended by new introductory content (a slimmed-down version of “The Riak Fast Track”) and conceptual information relevant to both developers and operators.
Furthermore, since developers tend to actually write code, our examples are being refreshed to use code samples, rather than HTTP calls. This will be an ongoing process.
Where Art Meets Science
The decisions about the documentation refresh combined instinct, preference, and empirical data about usage. As the community provides feedback, we will continue to make changes to improve usability.
As with any project of this scope, many members of the Basho team were involved: our engineers who write documentation, the Docs Cabal that managed the process, and the Basho design team that provided dozens of possible designs. This distributed team was able to leverage the best of each others work to produce something beautiful and, most important of all, useful.