September 17, 2013
The Praekelt Foundation is a non-profit that builds open source, scalable mobile technologies and solutions to improve the health and well-being of people living in poverty. Their Vumi solution was created as a response to the rapid spread of mobile phones across Africa. Vumi allows for large scale mobile messaging using SMS and USSD, so no internet connectivity is required. Vumi uses Riak as a super reliable backend to store all the messages that are being processed and all responses. This data is all archived to allow for further analysis to see trends of areas and which campaigns are the most successful.
The Vumi Network reaches hundreds of thousands of end users across many countries. It works with non-governmental organizations to set up campaigns and services for emerging markets. These include education (Wikipedia uses Vumi to allow end-users to search and retrieve information from Wikipedia over SMS/USSD), health (partnering with Johnson & Johnson, the MAMA campaign (Mobile Alliance for Maternal Action) allows pregnant women to receive health information over SMS based pregnancy stage and HIV diagnosis), peaceful messaging (Sisi Ni Amani uses Vumi to prevent election violence in Kenya through grassroots engagement and tracking of early conflict warning signs), as well as many other utilities.
They had been using Postgres for years but, when it came to storing messages, they knew Postgres was only an interim solution. Since they needed a non-relational system, they started evaluating the key players in the NoSQL space. With MongoDB, they found the durability defaults needed for a performance boost were not adequate for their zero downtime needs; CouchDB did not give them the performance they needed; and Cassandra was too operationally intensive for their small team and Riak offered better features. When they began testing Riak, Praekelt Foundation Chief Engineer, Simon de Haan, was able to get a three-node cluster up and running on his laptop in 20 minutes. This operational simplicity, the reliability of the system, the ability to seamlessly scale to entire populations, and the range of query options made Riak a clear choice to power Vumi.
“It blew my mind how easy it was to set up Riak and was a huge selling point for our small operations team,” said de Haan. “We also needed a reliable system with solid up-time guarantees. Riak has never gone down on us and continues to survive individual restarts. The whole thing just works.”
Since launching Riak two years ago, they are running five nodes and push 1,000 messages each second. All messages are stored as JSON in Riak, which makes it easy for them utilize Secondary Indexing and MapReduce when querying this data. With the introduction of pagination with Secondary Indexes and Eventually Consistent Counters in Riak 1.4, they have also been able to move a lot of data from Redis over to Riak to take advantage of these new features. Additionally, The Praekelt Foundation will expand their querying capabilities later this year when Riak Search gets a makeover in the Riak 2.0 release.
The Praekelt Foundation is currently evaluating Riak and Riak CS for some of their other technologies and Basho is proud to be a part of such a great cause. For more information on the Praekelt Foundation and Vumi, visit their site at www.praekeltfoundation.org/
September 12, 2013
Superfeedr provides a real-time API to any application that wants to produce (publishers) or consume (subscribers) data feeds without wasting resources or maintaining an expensive and changing infrastructure. It fetches and parses RSS or Atom feeds on behalf of its users and new entries are then pushed to subscribing applications using a webhook mechanism (PubSubHubbub) or XMPP. The Google Reader replacement is an example of a popular API built by Superfeedr that has backed up much of Google Reader.
Riak is used by Superfeedr to store the content from all feeds so users can retrieve past content (including the Google Reader API replacement), even if the feeds themselves may not include these entries anymore. This Riak datastore is referred to as “the cave.”
When Superfeedr first built “the cave” datastore, they opted for a cluster of large Redis instances (five servers with 8GB of memory each) due to its inherent speed. However, they realized that a more durable system was needed and the need to manually shard feeds across the cluster made it difficult to scale beyond storing a couple entries per feed. The scaling problem turned into an even larger issue because the average size of a stored entry was 2KB. Now, they had nearly 1,000 items per feed and 50 million feeds, translating to over 93TB of data and quickly growing.
They chose to move “the cave” to Riak due to its focus on availability (as delivering stale data was more important than delivering no data) and ease-of-scale. According to Superfeedr Founder, Julien Genestoux, “Riak solves the scalability problem elegantly. Through consistent hashing, our data is automatically distributed across the cluster and we can easily add nodes as needed.” While Riak does have a lower read performance than Redis, this proved to not be a problem as they found it easy to put caches in front of Riak if they needed to serve content faster.
Though Superfeedr found it easy to set up their Riak cluster, the default behavior for handling conflicts had to be adjusted for their use case. By working with Basho and the Riak community, they were able to find the right settings and optimize their conflict resolution algorithm. For more information on Riak’s configurable behaviors, check out our four-part blog series.
Superfeedr went into production in two phases: they started storing production data in the beginning of 2013 and began serving that data about two months later. During this period, Superfeedr was able to design their cluster infrastructure and thoroughly performance test it with actual production data.
Two types of objects are stored in Superfeedr’s Riak datastore: feeds and entries. Feeds are stored as a collection of internal feed ids, which correspond to the entries and include some meta-information, such as the title. Entries correspond to feed entries and are indexed by feedID-entryID, allowing them to store multiple entries for each feed. This indexing scheme allows entries to be retrieved, even if they lose track of the feed element, through a MapReduce job.
At write time, Superfeedr writes both the feed element and the entry element. When they query for a feed, they issue a MapReduce job to read both the feed element and the desired number of entry items. They also use a pagination mechanism to limit the resources consumed for each request, with an arbitrary limit of 50 entries.
Today, Superfeedr has served over 23 billion entries, with nearly one million more being published every hour. Their six-node Riak cluster (built on 16GB Linode slices) has allowed them to horizontally scale their cluster as their content and user base grows. “Riak is the right tool for us due to its scalability and always on availability,” said Genestoux. “We have refined it to fit our needs and can rest-assured that no data will ever be lost in our Riak ‘cave.’”
If you’re looking for a Google Reader replacement or interested in learning more about Superfeedr, check out their site: superfeedr.com/. For other examples of Riak in production, visit: basho.com/riak-users/