February 26, 2014
Amherst College is a private liberal arts college in Massachusetts that enrolls about 1,800 undergraduates. Their Archives & Special Collections houses rare books, literary manuscripts, and unique and rare materials documenting the College and its history. Its collections include many of Emily Dickinson’s original poems and letters. The Amherst College Library has been working to digitize images, manuscripts, and rare books in the Archives, and improve access to a large collection of digital images used in the teaching of art and architecture. They currently have 140,000 objects in their digital collections and they are adding up to 10,000 new objects each month.
Fedora (the underlying digital asset management system used by many colleges) is used for archiving, storing, and managing these documents. While it has the ability to support the number of objects being stored, Fedora tends to favor object fixity checks (checksums) and XML schema validation over speedy response times. It has worked for Amherst in terms of digital preservation and metadata support, but they have run into problems with its ability to handle high levels of concurrency (such as when Bon Appétit Magazine directed users to an Emily Dickinson manuscript featuring a recipe for doughnuts: acdc.amherst.edu/view/asc:17832). They use Riak as the intermediary layer between Fedora and the web, and as a huge caching mechanism for all of their data.
Previously, they were using a PHP app that directly accessed Fedora. While this solution worked, it was resource intensive and too slow for most purposes. It also wouldn’t allow them to grow their repository at the rate needed. They evaluated a few different systems (including CouchDB and MongoDB), but found Riak’s lack of sharding made it extremely easy to scale and offered better fault tolerance than the others.
Amherst brought Riak into production earlier this year. They are storing around one million objects in Riak across four nodes. Riak unifies all of the XML- and RDF-based metadata about each of their digitized objects (such as structural metadata in RDF and descriptive metadata in MODS) and stores it in a single JSON structure. When querying, they typically utilize the general key/value lookup or run MapReduce jobs. Since moving to Riak, their entire system is now an order of magnitude faster.
“We have been extremely happy with Riak and what it provides,” says Aaron Coburn, Systems Administrator at Amherst College. “While most of the objects stored aren’t publicly available, Riak still allows us to make over 2,000 manuscripts available to the world.”
December 16, 2013
At Strange Loop 2013, Garrett Eardley (Software Engineer at Riot Games) presented “Tracking Millions of Ganks in Near Real Time.” His talk focused on the popular game, League of Legends, and the challenges that come with supporting millions of concurrent players at any given moment of the day. At this scale, tracking even simple gameplay statistics results in the creation of terabytes of data that must be aggregated and presented to players in near real-time. Garrett and his team found that scaling the system and developing additional features became difficult while using Riot’s initial technology choice for statistics, MySQL.
They found that they weren’t using many of the features available with MySQL and began to evaluate some NoSQL solutions. They looked at HBase, Cassandra, CouchDB, and Riak. With this new system, they wanted high availability over consistency, which narrowed the options down to Riak and Cassandra. Based on negative past experiences and a lack of conflict resolution control, the team ultimately decided on Riak over Cassandra.
Garrett’s talk explores how Riak is leveraged for their next generation stats system, discusses why they selected Riak, looks at how they structure their data and indexes, and shares strategies for working with eventually consistent data. The complete recording of Garrett’s talk is now available. You can check out the talk and his slides here.
For more information on how Riak is being used for gaming services and applications, download the Gaming on Riak whitepaper.
December 9, 2013
Tomorrow (December 10th) at 10am PT/1pm ET, we will be hosting a live webinar, “Beyond NoSQL – Distributed Databases in Production.” This webinar will feature Matt Aslett (Research Director at 451 Research), Bobby Patrick (EVP and CMO at Basho Technologies), and Wes Jossey (Systems Engineer at Tapjoy). There are still seats available, and you can register here for more details.
This webinar will talk about the history of NoSQL and what issues NoSQL aimed to solve in regard to relational systems. It will then look at the current NoSQL landscape and architecture trends. From there, the webinar will focus on Basho’s Riak, a distributed NoSQL database, and some of its key features and use cases. Tapjoy, the mobile performance-based advertising platform (and Riak user) will discuss how they use Riak to provide reliable data locality to their customers and why they selected Riak to be the cornerstone of their data management strategy. Finally, it will wrap up with a look at what’s to come with Riak 2.0 and have a live question and answer session.
Be sure and register now for “Beyond NoSQL – Distributed Databases in Production.”
December 5, 2013
By combining local, social, and mobile into a single experience, Citymaps has quickly revolutionized online mapping. Citymaps’ innovations marry a vector-based, detailed view of a city’s business landscape, with local search, interactive discovery, social sharing, and commerce. Citymaps launched its new mapping service this past summer, earning praise from many users and from Techcrunch and GigaOM.
Unlike traditional online mapping, Citymaps learns user patterns and adapts to a user’s personal interests. Users can share their favorite personal maps or check out the personal maps of their friends, mentors, and favorite celebrities. Today, Citymaps has over 15 million businesses already plugged into the service. Citymaps users can easily retrieve directions, menus, Instagram photos, Foursquare tips, and they can easily contribute content and pictures to a generated map.
Citymaps selected Basho Riak to store strategic data at the heart of their service. Citymaps wanted a distributed database that could meet the need of their rapid growth, while also being operationally easy to manage. Riak stores user avatars, business images and icons, and other strategic map data. It will also be used for distributed API caching in the future.
Citymaps is available on iOS in the Apple App Store. Citymaps plans to serve Android users by year-end.
November 21, 2013
Tapjoy is a mobile advertising and monetization platform that allows end users to select personalized advertisements that they can engage with in exchange for rewards. Tapjoy is available on over one billion devices to users all over the world. Riak has been the cornerstone of their data management strategy for the past year.
Tapjoy’s global growth required the company to consider scalability. Their original infrastructure was built on SimpleDB; however, with billions of requests coming in on an average day, they started to experience performance issues due to latency, as well as limits on the size and location of data being stored. With their growth straining their data store, they wanted to find a new solution that would guarantee performance and uptime, even with peak traffic.
“Tapjoy can’t have downtime ever, planned or unplanned,” stated Wes Jossey, Systems Engineer at Tapjoy. “If Tapjoy goes down, end users can’t interact with our platform and they leave, which is unacceptable to us and to our partners.” Due to Tapjoy’s high requirements for availability, scalability, and data redundancy, there were really only a few players in the space to choose from.
The Tapjoy team found that DynamoDB didn’t have all of the features they needed (especially Secondary Indexes, at that time); HBase wasn’t the right fit for their use case; and Cassandra was deemed too operationally intensive for their small team, based on information provided by third parties who had been using Cassandra in production for years. With Riak, the Tapjoy team estimates that they have been able to keep costs down, decrease engineering complexity, and reduce operational effort due to its ease of use and general stability.
With Riak, Tapjoy is able to meet its high availability mandate, and achieve its stringent low-latency requirements with requests as quick as 750 microseconds (due to the real-time aspects of their platform). Tapjoy stores 48TB of data in Riak and operates hundreds of thousands of reads and writes per second against their clusters.
Their current clusters are replicated between multiple data-centers, to allow for failover in the event of unexpected downtime in one of their main facilities. Tapjoy opted to become Riak Enterprise customers not only to facilitate this replication requirement, but also because of the excellent customer support the Basho team is able to provide. “We rarely have issues with Riak, so I don’t get paged,” said Jossey. “Riak is a critical piece of our business, and it’s a huge relief that it just works.”
Tapjoy leverages many open source tools and cloud-based technologies to achieve the team’s “Get stuff done” philosophy. Their stack includes:
- Ruby on Rails, Java, Objective-C
- Amazon Web Services
- Chef, IronFan, Sensu, RabbitMQ
- Riak, MySQL, Couchbase, PostgreSQL, Zookeeper
- Hadoop, HBase, Vertica
October 21, 2013
Irish-based utility meter management company, Temetra, has developed a first-of-its-kind wireless meter reading system that lowers the overall price of utilities by providing customers with highly accurate readings. To support this system, Temetra needed a scalable and reliable solution to access and store the growing volumes of critical data created by readings, which – for the average household – can number up to 400 each year. After reviewing Cassandra and Hadoop, Temetra chose Basho’s open source distributed database, Riak, to optimize efficiency and deliver a nimble and affordable service to customers.
Simplifying the Data Collection Process
Temetra offers a comprehensive data collection infrastructure that provides homes and businesses in the UK, Ireland, and Australia with intelligent metering for utility usage. This means that instead of manually checking meters periodically throughout each year, Temetra works from a wireless network that automatically collects and analyses usage data. This is done by simply driving past the meter whereas traditionally, meter readers had to visually copy the data index by hand. This new method allows the company to better predict usage and deliver more accurate results and pricing – saving Temetra time and the customer money.
The wireless system can collect 300-400 reads per year for the average household, as opposed to the normal rate of two reads. As many as 35,000 reads per year are now collected by fixed networks for larger consumers such as hotels and hospitals. This approach has been fundamental to Temetra’s competitive differentiation; however, with such a high volume of data, Temetra faced the challenge of finding a simple, scalable solution to store and access its data easily. “We needed a reliable solution that would allow us to support more and more meters on a fixed networks,” said Paul Barry, Temetra’s Managing Director. “Our relational SQL database just could not cope with the quickly rising levels of revenue critical data.”
Billions of Data Points
Temetra has thousands of users and millions of meters that create billions of data points. The massive influx of data that was being generated quickly became difficult to manage with the company’s legacy SQL database. When considering how this structured database could be overhauled, Temetra conducted evaluations with Cassandra and Hadoop but ultimately chose Riak due to its high availability, relatively self-maintaining and easy to deploy infrastructure. It is essential that the data collected from the meters is always available as it is relied on to determine correct billing for Temetra’s customers. This point was stressed by Barry with his statement, “As a small company managing a lot of revenue critical data, it is really important for us to have a reliable and easily accessible database. For example, during our development and testing phase, a Riak node went down for a day and it was only through monitoring that we spotted it. The ability to lose a node and not affect our service in any way is a huge advantage for us.”
The move from a relational database to the non-relational Riak was a big step for Temetra. The shift required an adjustment to treat the database as a low maintenance, high performance, and high availability key value store. For Temetra, the biggest change was denormalizing the data or, in other words, allowing for several copies to be stored. Riak provided the best performance by allowing the company to store data in different ways. For example, data comes in from meters as a single data point that needs to be loaded and turned into compute data. In Riak, Temetra is able to store the meter data in multiple ways, allowing it to come out pre-calculated in the quick and ready form of consumption data. Once they were comfortable with Riak’s replication technology, Temetra was able to load its data from the legacy file store and use Riak as a limitless data store across its multiple sites.
The Benefits of Riak
As a growing company with 3.5 million meters currently collecting revenue critical data, Riak’s ability to easily support additional nodes is a key benefit for the company as it continues to scale up. With Riak, Temetra can continue to expand without huge amounts of additional costs and hardware and is able to bring another server online within 20 minutes, allowing the business to prepare for big, new customers very quickly. This flexibility reflects and underpins Temetra’s own fast growing innovative nature.
Another benefit is the pricing. Temetra’s competitors operate in SQL mode and are unable to scale as easily or as quickly. Working with Riak has allowed Temetra to break away from those limitations which is reflected in the pricing for customers, giving the company a significant competitive edge. Additionally, Riak was very easy for Temetra to introduce. As a small company, Temetra doesn’t want a large administration staff dedicated to looking after the IT infrastructure; therefore, Riak’s relatively self-maintaining nature, alongside Basho’s expert support, was a definite advantage.
“Most of our competitors still operate in SQL mode,” noted Barry. “By working with a distributed database that, from a flexibility and resiliency perspective puts us more in line with the way Google or Twitter work, we can disrupt the way that data is stored traditionally to scale faster and easier. This is reflected in our prices and our ability to rapidly introduce new functionality. I think our customers definitely see that benefit.”
October 10, 2013
JBA, based in Melbourne, Australia, is a customer-centric digital consultancy that specializes in developing customer understanding, providing experience optimization, behavioral targeting, and multichannel conversion. Their main customers are multi-channel retailers with eCommerce operations that want to gain deeper insights on their customers (such as reasons for shopping cart abandonment and retargeting). JBA uses Riak as a core part of this behavioral analysis and remarketing tool.
JBA started developing their behavioral analysis products 18 months ago and Riak has been in production since the beginning. When they first developed their flagship tool, they needed a key/value database to easily store all the user behavior data. On top of that, they needed a system that would scale easily, had Python integration for data analysis, would work well with other systems already in their stack, and was operationally simple for their small team. They assessed Riak, Cassandra, DynamoDB and MongoDB, but decided Riak was a better fit for their needs. Riak’s Python client library made it easy to work with, it’s built for scale, their operations team can easily manage the cluster using Riak’s command line tools, and they could even run it in AWS (as they were already using AWS heavily).
JBA currently has ten nodes in their cluster, all running on smaller Amazon instances. The ability to run on low-powered instances and simply scale up as needed versus having fewer high-powered instances has been vital to them. Since they primarily deal with online retailers, JBA can scale up to account for holiday sales cycles or new product releases and then scale back down. This flexibility helps to manage their costs.
They store over 10 million objects in Riak, with each object representing a customer state or a shopping cart. “We never have to worry about how much we’re storing because we can just scale out to cope with capacity issues,” said Matt Black, Senior Developer at JBA. “Riak gives us the ability to both store a lot of data but also look at objects in isolation. This is perfect for us because we rarely look at the whole data set in aggregate; we’re more interested in the state of individual users.”
JBA is also evaluating where else they can use Riak within the company, especially as they expand their behavioral analysis tools. They are firm believers in using the right tool for the job and currently also use MySQL for structured data, Hadoop for large scale MapReduce, and RabbitMQ for messaging. “Riak has done the job we set out to do. We’ve been very happy with it and we’re looking for more ways to integrate Riak into our business,” said JBA CTO, Andrew Fisher.
For more information about JBA, visit their site at www.jbadigital.com/
October 8, 2013
Riak is built to handle critical data. Its design tenets of high availability, fault-tolerance, and scalability ensure that you will always have access to this critical data, no matter what happens behind the scenes, and that you can quickly grow to store it all, no matter what. This makes it an ideal fit for many industries, including healthcare. It’s also why the National Health Service (NHS) in the UK has selected Riak for its IT backbone.
To help drive efficiency and care improvements throughout the UK, the NHS created Spine1 as its centralized database of all patient health and prescription data. Through this innovation, critical patient information was always accessible and protected. However, the original Spine1 infrastructure required over 2,000 staff and was supported by 1,000 servers, which meant this system was pricey for the publicly funded NHS.
Over the past two years, the NHS has worked to revamp this database to be more cost-effective. They opted to move the system to Basho’s Riak and created Spine2. With Spine2, the NHS is able to not only cut costs, but also improve the performance and reliability of the system overall. Spine2 is planned to go live in early 2014.
In the healthcare space, The Danish Health and Medicines Authority also use Riak as the backend for their national health record system. Its high availability ensures that key patient information is always available, which can be life saving in many cases. Additionally, its ease-of-scale allows for governments or private companies to quickly add capacity as needed, without paying for unnecessary space.
For more information about Basho and the NHS, check out the full release.
October 3, 2013
Moz provides analytics software to track all of a website’s inbound marketing efforts on one platform. Dedicated to helping people do better marketing, Moz creates easy-to-use tools, tutorials, and educational resources for learning inbound marketing—and fosters the web’s most vibrant online marketing community. With offices in Seattle, WA and Portland, OR, Moz supports over 27,000 customers and 300,000 community members worldwide. For nearly three years, they have been using Riak to store customer campaign search engine rankings data.
Originally, Moz was storing campaign search engine rankings data in MySQL servers. However, as their customer base grew, they were struggling to grow their relational system at the same rate. Moz’s policy is to select the best tool for the job. For each use case, they test a variety of databases and select the best option based on the results of the test. For customer campaign data, their top priorities were scalability and having a range of querying options. Their decision, eventually, was whittled down to Riak and Cassandra. For customer campaign search engine rankings, this data needed to be written immediately to the database and accessed quickly and easily. Additionally, MapReduce capabilities simplified retrieving this data, and compiling summary information for their users. With delays between writes and reads and a lack of MapReduce, Cassandra simply couldn’t keep up and Riak was ultimately selected.
According to Moz CTO, Anthony Skinner, “Riak is absolutely the best tool for the job. It was extremely straightforward to bring into production and every upgrade we’ve done has been seamless. Since we’re dealing with real-time campaign data, time is of the essence. We have been very impressed by how quickly Riak is able to redistribute data across nodes, especially when we need to add nodes to handle unexpected growth spikes.”
Moz currently has an 11 node Riak cluster. With 27,000 customers, each with many campaigns, they see a lot of data moving in and out of the system. Since the data is collected and provided to each customer, Moz archives a small subset of this data and has opted to keep the long-term storage below 5TB. The cluster itself has a current capacity of 8TB, with 700GB nodes, and, given that adding a node is relatively simple and painless, they haven’t needed to pre-provision much excess capacity.
Moz has a polyglot setup and uses a little bit of everything. Based on Riak’s straightforward nature, operational ease, and scalability, they will definitely be looking to Riak in the future as other use cases arise.
July 25, 2013
Hosted Graphite is an open-source, application metrics system that lets you measure, analyze, and visualize large amounts of data about your applications and backend systems, without worrying about setting up your own server and dealing with scaling, backups, or maintenance. They use Riak to store all of their metrics – a time series collection of name-value data.
Hosted Graphite was originally using Whisper, a fixed-size database, which stored their time series data as binaries on disk. However, its focus on simplicity meant that it didn’t offer replication or other helpful features. As their data set grew, they knew they’d need to switch to a system that could more easily distribute their data and scale effectively. Additionally, since there weren’t any plans to hire past the existing two-person ops team, they needed a system that provided always-on availability (as any failures are highly visible to their customers) and operational simplicity.
Based on their criteria, they were able to quickly rule out many other database options. When they came across Riak, it fit all of their requirements and looked operationally friendly, so they decided to try it. They were able to easily get Riak into production and have been live with Riak since June of 2012.
Hosted Graphite runs two Riak clusters and a total of nine nodes. They are currently storing 1.5 billion keys and see 60 GB of growth per day across their nodes. They use both the Bitcask and LevelDB backends.
As Charlie von Metzradt, co-founder of Hosted Graphite, said, “Launching with Riak has helped us sleep at night. We don’t need to worry when a node or two goes down, as we can just deal with it later. For a two person team, this has been invaluable.”
For more information on Hosted Graphite’s experiences with Riak, check out Charlie’s talk from a recent meetup.
You can also visit basho.com to see if Riak is the right fit for your data.