January 15, 2013
Today we’re introducing an easier way to build Riak clusters on AWS using CloudFormation.
The project, cloudformation-riak, comes with three CloudFormation templates. These templates range from building a simple Riak cluster to building a VPC-based stack that includes: a front-end load balancer; a cluster of application servers with a Riak powered demo application; a backend load balancer; and a riak-cluster.
Head over to the cloudformation-riak repo to get started. We also put together a screencast (below) that shows things in action.
January 14, 2013
This is the second in a series of blog posts that discusses a high-level overview of the benefits and tradeoffs of Riak versus traditional relational databases. If this is relevant to your projects or applications, register for our “From Relational to Riak” webcast on Thursday, January 24.
One critical factor in deciding which database to use is its operational profile. Many customers today are dealing with rapid data growth, intense peak loads and the imperative to maintain economies of scale across a large platform. For these customers, how the database scales up and what impact that has on operations is a huge factor in business and technical decisions around what technology to use.
The cost of scale is one reason why many of our users and customers have picked Riak over a traditional relational system. From experience, users have discovered that scaling a relational system can be expensive, error-prone and lead to significant and disruptive operations projects. In this blog, we’ll take a look at how a relational database’s sharding approach differs from Riak’s consistent hashing approach and what that means for you as an operator.
Historically, relational databases were commonly found running in production on a single server. If capacity and availability needs require more than a single machine, relational databases address scale using a technique called sharding. Sharding breaks data into logical parts (such as alphabetically, numerically or by geographic region) that can be distributed across multiple machines. A simplified example is below.
This approach can be problematic for several reasons. First, writing and maintaining sharding logic increases the overhead of operating and developing an application on the database. Significant growth of data or traffic typically means significant, often manual, resharding projects. Determining how to intelligently split the dataset without negatively impacting performance, operations, and development presents a substantial challenge– especially when dealing with “big data”, rapid scale, or peak loads. Further, rapidly growing applications frequently outpace an existing sharding scheme. When the data in a shard grows too large, the shard must again be split. While several “auto”-sharding technologies have emerged in recent years, these methods are often imprecise and manual intervention is standard practice. Finally, sharding can often lead to “hot spots” in the database – physical machines responsible for storing and serving a disproportionately high amount of both data and requests – which can lead to unpredictable latency and degraded performance.
To avoid sharding (and the associated expenses), data in Riak is distributed across nodes using consistent hashing. Consistent hashing ensures data is evenly distributed around the cluster and new nodes can be added with automatic, minimal reshuffling of data. This significantly decreases risky “hot spots” in the database and lowers the operational burden of scaling.
How does consistent hashing work? Riak stores data using a simple key/value scheme. These keys and values are stored in a namespace called a bucket. When you add new key/value pairs to a bucket in Riak, each object’s bucket and key combination is hashed. The resulting value maps onto a 160-bit integer space. You can think of this integer space as a ring used to figure out what data to put on which physical machines.
How? Riak divides the integer space into equally-sized partitions (default is 64). Each partition owns the given range of values on the ring, and is responsible for all buckets and keys that, when hashed, fall into that range. Each partition is managed by a process called a virtual node (or “vnode”). Physical machines in the cluster evenly divide responsibility for vnodes. Each physical machine thus becomes responsible for all keys represented by its vnodes.
When nodes are added or removed, data is rebalanced automatically without any operator intervention. New machines assume ownership of some of the partitions and existing machines hand off relevant partitions and associated data until data ownership is equal amongst nodes. Riak also has an elegant approach to making cluster changes such as adding or removing nodes, allowing you to stage up the changes, view the impact on the cluster, and then choose to commit or abort the changes. Developers and operators don’t have to deal with the underlying complexity of what data lives where as all nodes can serve and route requests. By eliminating the manual requirements of sharding and much of the potential for “hot spots,” Riak provides a much simpler operational scenario for many users that lets them add and remove machines as needed, no matter how much they grow.
January 10, 2013
This is the first in a series of blog posts that discusses a high-level overview of the benefits and tradeoffs of Riak versus traditional relational databases. If this is relevant to your projects or applications, register for our “From Relational to Riak” webcast on January 24.
One of the biggest differences between Riak and relational systems is the focus on availability and how the underlying architecture deals with failure modes.
Most relational databases leverage a master/slave architecture to replicate data. This approach usually means the master coordinates all write operations, working with the slave nodes to update data. If the master node fails, the database will reject write operations until the failure is resolved – often involving failover or leader election – to maintain correctness. This can result in a window of write unavailability.
Conversely, Riak uses a masterless system with no single point of failure, meaning any node can serve read or write requests. If a node experiences an outage, other nodes can continue to accept read and write requests. Additionally, if a node fails or becomes unavailable to the rest of the cluster due to a network partition, a neighboring node will take over responsibilities for the unavailable node. Once this node becomes available again, the neighboring node will pass over any updates through a process called “hinted handoff.” This is another way that Riak maintains availability and resilience even despite serious failure.
Because Riak’s system allows for reads and writes, even when multiple nodes are unavailable, and uses an eventually consistent design to maintain availability, in rare cases different replicas may contain different versions of an object. This can occur if multiple clients update the same piece of data at the exact same time or if nodes are down or laggy. These conflicts happen a statistically small portion of the time, but are important to know about. Riak has a number of mechanisms for detecting and resolving these conflicts when they occur. For more on how Riak achieves availability and the tradeoffs involved, see our documentation on the subject.
For many use cases today, high availability and fault tolerance are critical to the user experience and the company’s revenue. Unavailability has a negative impact on your revenue, damages user trust and leads to a poor user experience. For use cases such as online retail, shopping carts, advertising, social and mobile platforms or anything with critical data needs, high availability is key and Riak may be the right choice.
January 9, 2013
Synacor’s TV Everywhere platform enables cable, satellite, consumer electronics and telco companies to stream content and programming to any device, anytime. TV Everywhere also provides innovative search, discovery and recommendation solutions combined with deep social media integration.
Synacor TV Everywhere uses Riak as object storage for video clips, news stories and other content. Originally using a relational solution as their primary datastore, API response times had started to slow as they continued to add more assets. After evaluating several possible solutions, they chose to move to Riak due to its low latency and Synacor’s high availability requirements.
Riak Enterprise has been deployed in multiple Synacor datacenters and has improved the API response time significantly since its integration. Synacor now stores over 5 million assets with thousands being added daily. According to Michael Collins, Synacor’s Senior Director of Engineering, “Riak has never been the source of a bottleneck for us. It’s been great.”
For more details, check out the complete case study, “TV Everywhere with Synacor and Riak”
January 9, 2013
Today, Microsoft Open Technologies, Inc announced the public preview of VM Depot. Basho is pleased to participate in this launch. Available today, you can quickly deploy a virtual machine image, configured with an OSS Riak implementation from the VM Depot.
Ease of deployment is a common theme we hear from the community…ensuring Riak is available on your platform of choice is part of our purpose in supporting your deployment needs. Whether it’s quickly prototyping an internal application in the enterprise, deploying a hybrid cloud solution, or leveraging solely public cloud services, Riak is an excellent choice for solving your data-storage needs at scale.
Given that this is a public preview, installation documentation is forthcoming. When it is ready, and that will be soon, you can find it on our documentation portal. In the mean time, feel free to ask questions, or provide feedback, on the mailing list.
January 7, 2013
Riak Cloud Storage is simple, available cloud storage software built on top of Riak. It offers an S3 API, multi-tenancy and large object support for enterprises building public or private clouds. We want to make it easier to get started with Riak CS, so we’re now offering a self-service test harness. Visit riakcs.net to sign up – you can explore the functionality, test API operations, and experiment with clients and development apps. With the self-service feature, you can start playing right away.
Note that the test harness is primarily for exploring Riak CS features – if you want to do load testing and performance benchmarking, you should sign up for a developer trial that will give you access to Riak CS packages you can install and test on your own hardware.
Interested in learning more about Riak CS? All of the docs are available online.
January 3, 2013
Most teams considering using Riak come from a relational database background. From our webcast on moving from relational to Riak, the below slide deck covers an overview of Riak, how the architecture differs from a relational approach, the advantages for scaling and development, and what’s different about application building and database operating in a non-relational world. We also include a few stories of Riak users who replaced MySQL or added Riak to the mix.
Interested in learning more? Check out our overview, From Relational to Riak.
December 31, 2012
Happy Holidays from all of us here at Basho. We’ve got some new code to help you ring in the new year. Ryan Zezeski and others have been hard at work on Yokozuna, the next generation of Riak Search that marries Riak with Apache Solr.
The latest pre-release, 0.2.0, was just tagged, and there’s plenty to be excited about for those of you who are interested in test-driving the code. In addition to various bug fixes, some of the new features include:
- Active Anti Entropy Support – Automatic background processing that seeks out and rectifies divergences between data stored in Riak and indexes stored in Yokozuna.
- Benchmark Scripts – A pre-built collection of benchmarking scripts for automating performance testing.
- Sibling Support – When enabled, Yokozuna will now index all object versions. It will also handle index cleanup upon sibling resolution.
The full release notes are up on the GitHub repo.
Commits in this release came from Ryan Zezeski, Eric Redmond, and Dan Reverri. Mark Steele also reported a few issues that were fixed in this release.
Remember that this is alpha software, and won’t be officially supported by Basho until a future release. That said, Ryan and the team are actively looking for beta testers with use cases that might be appropriate for Yokozuna. If you’re in the market for scalable, distributed full-text search, join the Riak Mailing List and start asking questions.
There’s a pre-built Yokozuna AWS AMI (ami-8b8d03e2) with the latest changes that’ll make it easy to take Yokozuna for a test drive.
December 19, 2012
We work with many people building platforms and applications on Riak that have traditionally lived on relational systems. The switch to Riak can be driven by the needs of greenfield applications and growing data volumes, business requirements around scale and availability, or the desire for operational ease and multi-site replication.
Some of the most common questions we get are from teams with a primary background in MySQL, Oracle Database or other relational systems wondering about the advantages and tradeoffs of moving to Riak. To address these common questions, we’ve written up an introductory guide on moving from relational to Riak. In it, we cover:
- Scalability benefits of Riak, including an examination of limitations around master/slave architectures and sharding, and what Riak does differently
- A look at the operational aspects of Riak and where they differ from relational approaches
- Riak’s data model and benefits for developers, as well as the tradeoffs and limitations of a key/value approach
- Migration considerations, including where to start when migrating existing applications to Riak
- Riak’s eventually consistent design, how it differs from a strongly consistent design, and things you need to know about handling data conflicts in Riak
- Multi-site replication options in Riak
Running Riak on AWS just got easier. Announcing Riak AMI, a ready-built virtual machine and configuration of Riak for Amazon EC2.
CAMBRIDGE, MA – December 14, 2012 – A number of our community members and customers already use Riak on AWS, and with the Riak AMI getting up and running should be much easier. The Riak AMI helps support a growing number of hybrid implementations where businesses use both private infrastructure and public cloud services. This hybrid model can be leveraged to address burst capacity issues, tenancy/locality concerns, and simple proof-of-concept deployments, in addition to a myriad of other business challenges.
For more information, read our post here: http://basho.com/blog/technical/2012/12/14/Riak-on-Amazon-Marketplace-AMI/.