January 22, 2013
Best Buy is North America’s top specialty retailer of consumer electronics, personal computers, entertainment software, and appliances. Riak has been an integral part in the transformation push to re-platform Best Buy’s eCommerce platform. Riak’s architecture has helped Best Buy build and operate its new platform with Riak playing a key role.
At our developer conference, Ricon, we were lucky to have Joel Crabb, Director of Web Architecture at BestBuy.com, talk about how Riak fits into their pattern of innovation and moving from a traditional relational architecture.
To learn more about how retailers can use Riak for their eCommerce needs, check out the whitepaper, “Retail on Riak: A Technical Introduction.” For more information on moving from a relational database to Riak, sign up for our webcast this Thursday, covering advantages, tradeoffs and development considerations.
January 21, 2013
Mad Mimi is an email marketing service that allows users to create, send, and track email campaigns without using templates. With over 100,000 clients, Mad Mimi is storing a large amount of data that needs to be accessed quickly and easily.
In 2011, Mad Mimi realized that their data was growing beyond the capacity of their MySQL database. Rather than resharding the data, which would require an extensive operational effort, Mad Mimi decided to try Riak based on its ability to scale quickly and easily without manual sharding.
Mad Mimi now uses Riak to track email statistics, leveraging the secondary indexing feature to make retrieving data easier. Secondary indexing allows users to attach additional key/value data to Riak objects and query them by exact match or range value. Mad Mimi is currently running an 8 node cluster storing between three and five billion keys, adding between 10-20 million keys each day.
Since launching with Riak, their cluster has never gone down and it is still as fast as ever. Based on this success, they hope to move all their email tracking statistics to Riak and eliminate MySQL entirely.
For more details on Mad Mimi’s experience with Riak, check out the case study, “Email Marketing Success with Mad Mimi and Riak.”
For more information on moving from a relational database to Riak, sign up for our webcast this Thursday, covering advantages, tradeoffs and development considerations.
January 15, 2013
enStratus is a cloud infrastructure management solution for deploying and managing enterprise-class applications. You can think of enStratus as the enterprise console to cloud computing – a unified solution for managing single or multi-cloud environments. enStratus uses Riak to store a combination of read-heavy and write-intensive data, including machine and state information, and data supporting analytics and audit control.
Previously, enStratus had relied on MySQL as its primary data store, but needed to provide a greater level of write availability and resilience to failure across multiple datacenters. Scaling writes in MySQL had become a bottleneck, and MySQL’s master/slave replication made master nodes a possible single point of failure.
First migrating customer and API data to Riak, enStratus successfully made the switch to Riak’s data model and eventually consistent approach, which favors availability over consistency in the event of node failure or network partition. “As I’ve looked at a number of problem domains from customers and our own systems, you see this pattern where a relational database has been used just because it’s the default… and the reality is that more of the world is eventually consistent than not,” said George Reese, CTO of enStratus.
At our developer conference Ricon, we were lucky to have George speak about migrating from MySQL to Riak, enStratus’ “design for failure” architecture, and how their application is built. George also talks about challenges of moving to a non-relational system, including adjusting to the data model and migration approaches. You can view the video below, or read the full case study here.
Want more info on moving from MySQL to Riak? Sign up for our webcast on Thursday, January 24 here or read our whitepaper on moving from relational to Riak.
January 15, 2013
Today we’re introducing an easier way to build Riak clusters on AWS using CloudFormation.
The project, cloudformation-riak, comes with three CloudFormation templates. These templates range from building a simple Riak cluster to building a VPC-based stack that includes: a front-end load balancer; a cluster of application servers with a Riak powered demo application; a backend load balancer; and a riak-cluster.
Head over to the cloudformation-riak repo to get started. We also put together a screencast (below) that shows things in action.
January 14, 2013
This is the second in a series of blog posts that discusses a high-level overview of the benefits and tradeoffs of Riak versus traditional relational databases. If this is relevant to your projects or applications, register for our “From Relational to Riak” webcast on Thursday, January 24.
One critical factor in deciding which database to use is its operational profile. Many customers today are dealing with rapid data growth, intense peak loads and the imperative to maintain economies of scale across a large platform. For these customers, how the database scales up and what impact that has on operations is a huge factor in business and technical decisions around what technology to use.
The cost of scale is one reason why many of our users and customers have picked Riak over a traditional relational system. From experience, users have discovered that scaling a relational system can be expensive, error-prone and lead to significant and disruptive operations projects. In this blog, we’ll take a look at how a relational database’s sharding approach differs from Riak’s consistent hashing approach and what that means for you as an operator.
Historically, relational databases were commonly found running in production on a single server. If capacity and availability needs require more than a single machine, relational databases address scale using a technique called sharding. Sharding breaks data into logical parts (such as alphabetically, numerically or by geographic region) that can be distributed across multiple machines. A simplified example is below.
This approach can be problematic for several reasons. First, writing and maintaining sharding logic increases the overhead of operating and developing an application on the database. Significant growth of data or traffic typically means significant, often manual, resharding projects. Determining how to intelligently split the dataset without negatively impacting performance, operations, and development presents a substantial challenge– especially when dealing with “big data”, rapid scale, or peak loads. Further, rapidly growing applications frequently outpace an existing sharding scheme. When the data in a shard grows too large, the shard must again be split. While several “auto”-sharding technologies have emerged in recent years, these methods are often imprecise and manual intervention is standard practice. Finally, sharding can often lead to “hot spots” in the database – physical machines responsible for storing and serving a disproportionately high amount of both data and requests – which can lead to unpredictable latency and degraded performance.
To avoid sharding (and the associated expenses), data in Riak is distributed across nodes using consistent hashing. Consistent hashing ensures data is evenly distributed around the cluster and new nodes can be added with automatic, minimal reshuffling of data. This significantly decreases risky “hot spots” in the database and lowers the operational burden of scaling.
How does consistent hashing work? Riak stores data using a simple key/value scheme. These keys and values are stored in a namespace called a bucket. When you add new key/value pairs to a bucket in Riak, each object’s bucket and key combination is hashed. The resulting value maps onto a 160-bit integer space. You can think of this integer space as a ring used to figure out what data to put on which physical machines.
How? Riak divides the integer space into equally-sized partitions (default is 64). Each partition owns the given range of values on the ring, and is responsible for all buckets and keys that, when hashed, fall into that range. Each partition is managed by a process called a virtual node (or “vnode”). Physical machines in the cluster evenly divide responsibility for vnodes. Each physical machine thus becomes responsible for all keys represented by its vnodes.
When nodes are added or removed, data is rebalanced automatically without any operator intervention. New machines assume ownership of some of the partitions and existing machines hand off relevant partitions and associated data until data ownership is equal amongst nodes. Riak also has an elegant approach to making cluster changes such as adding or removing nodes, allowing you to stage up the changes, view the impact on the cluster, and then choose to commit or abort the changes. Developers and operators don’t have to deal with the underlying complexity of what data lives where as all nodes can serve and route requests. By eliminating the manual requirements of sharding and much of the potential for “hot spots,” Riak provides a much simpler operational scenario for many users that lets them add and remove machines as needed, no matter how much they grow.
January 10, 2013
This is the first in a series of blog posts that discusses a high-level overview of the benefits and tradeoffs of Riak versus traditional relational databases. If this is relevant to your projects or applications, register for our “From Relational to Riak” webcast on January 24.
One of the biggest differences between Riak and relational systems is the focus on availability and how the underlying architecture deals with failure modes.
Most relational databases leverage a master/slave architecture to replicate data. This approach usually means the master coordinates all write operations, working with the slave nodes to update data. If the master node fails, the database will reject write operations until the failure is resolved – often involving failover or leader election – to maintain correctness. This can result in a window of write unavailability.
Conversely, Riak uses a masterless system with no single point of failure, meaning any node can serve read or write requests. If a node experiences an outage, other nodes can continue to accept read and write requests. Additionally, if a node fails or becomes unavailable to the rest of the cluster due to a network partition, a neighboring node will take over responsibilities for the unavailable node. Once this node becomes available again, the neighboring node will pass over any updates through a process called “hinted handoff.” This is another way that Riak maintains availability and resilience even despite serious failure.
Because Riak’s system allows for reads and writes, even when multiple nodes are unavailable, and uses an eventually consistent design to maintain availability, in rare cases different replicas may contain different versions of an object. This can occur if multiple clients update the same piece of data at the exact same time or if nodes are down or laggy. These conflicts happen a statistically small portion of the time, but are important to know about. Riak has a number of mechanisms for detecting and resolving these conflicts when they occur. For more on how Riak achieves availability and the tradeoffs involved, see our documentation on the subject.
For many use cases today, high availability and fault tolerance are critical to the user experience and the company’s revenue. Unavailability has a negative impact on your revenue, damages user trust and leads to a poor user experience. For use cases such as online retail, shopping carts, advertising, social and mobile platforms or anything with critical data needs, high availability is key and Riak may be the right choice.
January 9, 2013
Synacor’s TV Everywhere platform enables cable, satellite, consumer electronics and telco companies to stream content and programming to any device, anytime. TV Everywhere also provides innovative search, discovery and recommendation solutions combined with deep social media integration.
Synacor TV Everywhere uses Riak as object storage for video clips, news stories and other content. Originally using a relational solution as their primary datastore, API response times had started to slow as they continued to add more assets. After evaluating several possible solutions, they chose to move to Riak due to its low latency and Synacor’s high availability requirements.
Riak Enterprise has been deployed in multiple Synacor datacenters and has improved the API response time significantly since its integration. Synacor now stores over 5 million assets with thousands being added daily. According to Michael Collins, Synacor’s Senior Director of Engineering, “Riak has never been the source of a bottleneck for us. It’s been great.”
For more details, check out the complete case study, “TV Everywhere with Synacor and Riak”
January 9, 2013
Today, Microsoft Open Technologies, Inc announced the public preview of VM Depot. Basho is pleased to participate in this launch. Available today, you can quickly deploy a virtual machine image, configured with an OSS Riak implementation from the VM Depot.
Ease of deployment is a common theme we hear from the community…ensuring Riak is available on your platform of choice is part of our purpose in supporting your deployment needs. Whether it’s quickly prototyping an internal application in the enterprise, deploying a hybrid cloud solution, or leveraging solely public cloud services, Riak is an excellent choice for solving your data-storage needs at scale.
Given that this is a public preview, installation documentation is forthcoming. When it is ready, and that will be soon, you can find it on our documentation portal. In the mean time, feel free to ask questions, or provide feedback, on the mailing list.
January 7, 2013
Riak Cloud Storage is simple, available cloud storage software built on top of Riak. It offers an S3 API, multi-tenancy and large object support for enterprises building public or private clouds. We want to make it easier to get started with Riak CS, so we’re now offering a self-service test harness. Visit riakcs.net to sign up – you can explore the functionality, test API operations, and experiment with clients and development apps. With the self-service feature, you can start playing right away.
Note that the test harness is primarily for exploring Riak CS features – if you want to do load testing and performance benchmarking, you should sign up for a developer trial that will give you access to Riak CS packages you can install and test on your own hardware.
Interested in learning more about Riak CS? All of the docs are available online.
January 3, 2013
Most teams considering using Riak come from a relational database background. From our webcast on moving from relational to Riak, the below slide deck covers an overview of Riak, how the architecture differs from a relational approach, the advantages for scaling and development, and what’s different about application building and database operating in a non-relational world. We also include a few stories of Riak users who replaced MySQL or added Riak to the mix.
Interested in learning more? Check out our overview, From Relational to Riak.