Author Archives: Alex Gutow

Developer Resources Now on Pinterest

March 7, 2013

We are excited to announce that we have a new board on our Pinterest all about getting started with Riak! We have included a variety of tools to help you start running Riak, learn about Riak’s design, and find Riak events near you. This board is also a great way to find key videos, blog posts, academic papers, and other content that can be useful when learning about Riak and distributed systems.

We will be constantly adding new material, so follow us to stay up-to-date on everything Riak. Check it out and let us know if there’s anything you’d like to see added!

Basho

Basho at the O’Reilly Strata Conference

March 6, 2013

Last week, the Basho team attended the O’Reilly Strata Conference. If you were there, we hope you stopped by our booth to learn more about the 1.3 release and get one of our brand new t-shirts!

We had a great time and met a lot of interesting people. During the conference, our own Tyler Hannan, Director of Technical Marketing, was interviewed about Riak and the conference. Check out his interview below:

Missed us at Strata? Check out our Events Page to see where Basho will be next. We’d love to chat!

Basho

Popular Use Cases for Mobile Platforms on Riak

March 5, 2013

Mobile platforms need to provide always available, low-latency experiences that can scale to millions of users and support highly concurrent access. Riak’s redundant and fault-tolerant design ensures mobile data can be served quickly and reliably, and Riak is run in production by many popular mobile applications. For a full overview, check out the whitepaper “Mobile on Riak: A Technical Introduction.” Below are a few key mobile use cases and basic approaches to modeling them in Riak:

User Data: Storing user accounts, profile information, and events is a common use case for Riak. Mobile apps often store this data in JSON documents, using a UUID or other identifier as the key. Data can be queried through Riak features such as secondary indexes, MapReduce, and full-text search.

Session Data: Since session IDs are commonly stored in cookies, or otherwise known at lookup time, they are a natural fit for Riak’s key/value model and Riak can serve these requests at predictably low-latency. Session data can also be encoded in many different ways and evolve without any administrative changes to schema.

Text & Multimedia Storage: Since Riak is content agnostic, mobile platforms can easily store a variety of different types of data, including audio, text, photos, video, etc. to power mobile experiences.

Social Authentication: Many mobile applications have users sign in via their Facebook or Twitter accounts. Riak’s key/value scheme makes it easy to store both registered accounts and the tokens that make it possible for users to authenticate with their social accounts.

Global Data Locality: Riak Enterprise’s multi-datacenter capabilities mean mobile data can be stored in physical proximity to users and served at low-latency no matter where they happen to be.

Here is a chart with possible ways these applications and services can be modeled using Riak’s key/value design. Of course, your application should be structured in a way appropriate to its access and query patterns, among other factors – this is just to get you started. For more information on designing applications with Riak, check out our documentation.

To learn more about how mobile platforms can use Riak for their data needs, check out the complete overview, “Mobile on Riak: A Technical Introduction.” For more details about Riak and the latest 1.3 release, sign up for our webcast on March 7th.

Basho

Rob Zuber on Copious and Riak

Copious is a social marketplace that, behind the scenes, blends data models at the storage level to fit specific use cases and features. Each of these comes with its own constraints and operational challenges. Activity streams, social context, product search, and purchase processing span the spectrum from absolute consistency requirements to “any response is better than no response.”

In this talk, Rob Zuber, Co-Founder at Copious, talks about why they put Riak into production and where it fits in their total data store strategy. Rob has a very healthy, humorous, and deep understanding of data stores and systems environments and it all shows in this presentation. This talk is well worth watching for anyone interested in how Riak fits social use cases and how to approach choosing the right tool for the job. He also has some one-of-a-kind insight into starting companies and building products with a small team that is not to be overlooked.

Basho

Multi-Datacenter Replication: Availability Zones and Public Cloud

February 28, 2013

In the last post, we looked at how Riak Enterprise’s multi-datacenter replication can be configured for backups and data locality. In this post, we examine two other common implementations: availability zones and public cloud use cases. For more information on Riak Enterprise architecture and configuration, download the complete whitepaper.

Availability Zones

Availability zones provide efficient multi-datacenter replication and data redundancy within a geographic region (such as a coast or a country). In this configuration, data is replicated within an availability zone’s series of datacenters. In the event that one of datacenters experiences an outage or serious failure, data can still be served from other datacenters within the same region.

One approach to setting this up is to have a “primary” site in a region where all reads and writes for specific users, applications, or data sets are directed. This primary cluster can then be replicated to one or more proximal secondary clusters. In other approaches, data can be replicated in real-time from one cluster to both another datacenter and other cold backups maintained for emergency conditions. The right approach is highly dependent on the requirements of users, availability, expense of bandwidth, and other constraints.

Public Cloud Use Cases

Riak is designed to be easy to use and operate on public clouds, and is partnered with many of the leading cloud providers, including Amazon Web Services, Microsoft Azure, and Joyent. Hosted Riak is also available from Engine Yard and Riak packages can always be manually installed on any physical or virtual provider, even if a machine image isn’t explicitly supported.

There are several use cases for Riak Enterprise’s multi-datacenter replication in the public cloud. Many enterprises want to maintain a cold or hot backup of their cluster in a public cloud for business continuity in the event of a datacenter outage in their private infrastructure. For other customers, the public cloud can provide a more cost-effective way of meeting peak loads, rather than building out private infrastructure to accommodate them year-round. For example, many retailers and media providers need to offer increased capacity over the holiday season. Riak Enterprise is used to scale out capacity on public clouds over these periods, either with full-sync or real-time sync depending on the business needs.

Finally, some enterprises run certain applications or services entirely on public clouds. Riak Enterprise allows for redundancy and data locality across public cloud availability zones for this use case, ensuring optimal performance and resiliency.

For a more in-depth look at common architectures and use cases for Riak Enterprise, download our technical overview. You can also sign up for our webcast on Thursday, March 7th.

Basho

Multi-Datacenter Replication: Backups and Data Locality

February 27, 2013

Multi-datacenter replication is a critical part of modern infrastructure, providing essential business benefits for enterprise applications, platforms and services. Riak Enterprise offers multi-datacenter replication so that data stored in Riak can be replicated to multiple sites. Over the next two posts, we will look at some common implementations, starting with configurations for backups and data locality. For more information on Riak Enterprise architecture and configuration, download the complete whitepaper.

Primary Cluster with Failover

One of the most common architectural patterns in multi-datacenter replication is maintaining a primary cluster that serves traffic and a backup cluster for emergency failover. This configuration can be an important component of compliance with regulatory requirements, ensuring business continuity and access to data even in serious failure modes.

In this configuration, a primary cluster serves as the production cluster from which all read and write operations are served. The backup cluster(s) is maintained in another datacenter. In the event of a datacenter outage or critical failure at the primary site, requests can be directed to the backup cluster either by changing DNS configuration or rules for routing via a load balancer.

For this use case, we recommend that your failover strategy be tested periodically so any potential issues can be resolved in advance of a crisis. It’s also beneficial to have your failover strategy fully defined upfront – know the conditions in which a failover mode will be invoked, decide how traffic will be directed to the backup, and document and test the failover strategy to ensure success.

Active-Active Cluster Configuration

To achieve data locality, when clients are served at low-latency by whatever datacenter is nearest to them, you can maintain two (or more) active, synced clusters that are both responsible for serving data to clients. An added benefit of this approach is that in the event of a datacenter failure where one of the clusters is hosted, all traffic can be served from the other, still-functional site for business continuity.

For data locality, requests can be load balanced across geographies, with geo-based client requests directed to the appropriate datacenter. For example, US-based requests can be served out of a US-based datacenter while EU-based requests can be served out of a regional site. For situations where not all data needs to be shared across all datacenters (or if certain data, such as user data, must only be stored in a specific geographic region to meet privacy regulations), Riak Enterprise’s multi-datacenter replication can be configured on a per-bucket basis so only shared assets, popular assets, etc. are replicated.

For a more in-depth look at common architectures and use cases for Riak Enterprise, download our technical overview. You can also sign up for our webcast on Thursday, March 7th.

Basho

Mobile on Riak: Overview and User Stories

February 26, 2013

Mobile platforms and applications pose unique infrastructure challenges for today’s companies. These applications require low-latency, always-available small object storage that can scale to millions or more users, and support highly concurrent access and traffic spikes.

Riak provides a number of benefits for these platforms, including:

  • Low-Latency Data Storage: Riak is designed to serve predictable, low-latency requests to provide a fast, available experience to all users.
  • Straightforward Data Model: Riak uses a simple key-value data model, which is ideal for storing and serving mobile content, user information, events, and session data. Riak is content agnostic, so there are no restrictions on content type.
  • Accommodates Peak Loads Gracefully: To handle increasing user data and accommodate peak loads during events, Riak makes it easy to add additional capacity and scale out quickly. Riak automatically rebalances data when new nodes are added, while its consistent hashing methodology prevents hot spots in the database.
  • Multi-Datacenter Replication: Riak Enterprise’s multi-datacenter replication allows mobile platforms to serve low-latency content to users all over world by maintaining a global data footprint.
  • For a full overview, download our new whitepaper on building mobile services with Riak

User Stories

Bump is a popular mobile app that makes it easy for users to share their contact information, photos, and other objects by simply “bumping” their smartphones. They use Riak to store user data and currently run 25 nodes of Riak storing about 3TB of data.

For more details about how Bump uses Riak and how they designed their application, check out Bump’s presentation at RICON2012, Basho’s 2012 developer conference. You can also read the complete case study for more information about why Bump chose Riak.

Voxer is a popular Walkie Talkie application for smartphones that allows users to send instant voice messages to one or more friends. They switched to Riak due to its fault-tolerance and ability to scale quickly and easily. They currently run more than 50 machines on Riak to support their huge growth and user base. For more details about how Voxer uses Riak, check out the complete case study and watch Matt Ranney’s talk at a Riak Meetup in San Francisco.

To learn more about how mobile platforms can use Riak for their data needs, check out the complete overview, “Mobile on Riak: A Technical Introduction.”

Basho

Advanced Mode Now Available for Riak Enterprise’s Multi-Datacenter Replication

February 25, 2013

This post takes an in-depth look at Riak Enterprise’s new multi-datacenter replication capabilities, available in the recent 1.3 release. Riak Enterprise’s multi-datacenter replication now ships with “advanced mode,” which features some performance and resiliency improvements that we’ve developed by working with production customers:

  • Instead of having only one TCP connection over which data is streamed from one cluster to another, this new version features multiple concurrent TCP connections (approximately one per physical node) and processes are used between sites. This prevents against possible performance bottlenecks, which can be especially common when run on nodes constrained by per-instance bandwidth limits (such as in a cloud environment).
  • Easier configuration of multi-datacenter replication. Simply use a shell command to name your clusters, then connect both clusters using an ip:port combination.
  • Better per-connection statistics for both full-sync and real-time modes.
  • New ability to tweak full-sync workers per node and per cluster, which allows customers to dial-in performance.

Details of the advanced mode capabilities are below. For more about the multi-datacenter replication upgrades and our 1.3 release, check out this recent article from GigaOM. For full technical details, check out our documentation on multi-datacenter replication. For an examination of common architectures and use cases for Riak Enterprise, including datacenter failover, active-active cluster configurations, availability zones, and cloud bursting, download our technical overview.

The new advanced mode of Riak Enterprise’s multi-datacenter replication takes the best features from the past single channel communications model and scales it up to one-to-one connections between peer nodes of clusters. With concurrent channels and the ability to constrain the maximum connections per node and per cluster, the new multi-datacenter replication allows the full capacity of the network and cluster size to scale the performance to available resources.

Simple Configuration
It begins with a much easier configuration command language and environment, with natural objects as sources, sinks, and cluster names. For example, real-time and full-sync enable/disable, start/stop, and status all use these human friendly symbols. All of the connections go through a single port, reducing network administration to a single firewall port forwarding. Riak then manages the different protocols on this port. Connections are persistent, resilient to outages, and adapt to changes in cluster names and IP addresses automatically.

Two Sync Modes
Real-time synchronization between clusters uses a new queueing mechanism that ensures maximum performance and graceful shutdown of nodes. This guarantees that there is no loss of replication data during upgrades or scheduled maintenance. It also automatically balances the load across all nodes of both the source and sink clusters and requires no configuration.

Full-synchronization between clusters uses a scheduling algorithm to maximize the transfer rate of data between peer nodes of the two clusters. Partitions are synchronized in parallel so that the maximum number of keys can be updated concurrently with the minimum overlap, which minimizes load and contention on both the source and sink clusters. Full-sync supports concurrent syncs between multiple clusters and optimizes the load dynamically, ensuring nodes never exceed their configured connectivity. This allows clusters to synchronize at maximum efficiency, without impacting their runtime performance for connected clients as they make PUT and GET requests.

We are also planning to include Secure Sockets Layer and Network Address Translation support in this advanced mode of multi-datacenter replication – it is currently only available in default mode. Additionally, future improvements will take advantage of the Active Anti-Entropy (that was introduced in Riak 1.3) to make full-sync differencing even faster. Stay tuned for more updates!

To learn more about Riak 1.3 and the new advanced mode for multi-datacenter replication, sign up for our webcast on Thursday, March 7th.

Basho

Introducing Riak 1.3: Active Anti-Entropy, a New Look for Riak Control, and Faster Multi-Datacenter Replication

February 21, 2013

Today we are excited to announce the latest version of Riak. Here is a summary of the major enhancements delivered in Riak 1.3:

  • Introduced Active Anti-Entropy. Riak now has active anti-entropy. In distributed systems, inconsistencies can arise between replicas due to failure modes, concurrent updates, and physical data loss or corruption. Pre-1.3 Riak already had several features for repairing this “entropy”, but they all required some form of user intervention. Riak 1.3 introduces automatic, self-healing properties that repair entropy on an ongoing basis.
  • Improved Riak Enterprise’s multi-datacenter replication performance. New advanced mode for multi-datacenter replication capabilities, with better performance, more TCP connections and easier configuration. Read more in this write up from GigaOM.
  • Improved graphical user experience. Riak Control, the user interface for managing and monitoring Riak, has a brand new look.
  • Expanded IPv6 support. IPv6 support in Riak now is supported by all interfaces.
  • Improved MapReduce. Riak MapReduce has improved back-pressure to reduce the risk of overwhelming endpoint processes during large tasks.
  • Simplified log management. Riak can now optionally send log messages to syslog.

Ready to get started or upgrade? Download the new release here, check out the official release notes, or read on for more details. Documentation for all products and releases is available on the documentation site. For an introduction to Riak and what’s new in Riak 1.3, sign up for our webcast on Thursday, March 7.

More on What’s in Riak 1.3

Active Anti-Entropy
A key feature of Riak is its ability to regenerate lost or corrupted data from replicated data stored on other nodes. Prior to this release, Riak provided two methods to repair data:

  • Read Repair: Riak compares the replies from all replicas during a read request, repairing any replica that is divergent or missing data. (K/V data only)
  • Repair Command via Riak Console: Introduced in Riak 1.2, the repair command enables users to trigger a repair of a specific partition. The partition is rebuilt based on a subset of data stored on adjacent nodes in the Riak ring. All data is rebuilt, not just missing or divergent data. (K/V and Search data)

Riak 1.3 introduces active anti-entropy, a continuous background process that compares and repairs any divergent, missing, or corrupted replicas (K/V data only). Unlike read repair, which is only triggered when data is read, the active anti-entropy system ensures the integrity of all data stored in Riak. This is particularly useful in clusters containing “cold data”: data that may not be read for long periods of time, potentially years. Furthermore, unlike the repair command, active anti-entropy is an automatic process, requiring no user intervention and is enabled by default in Riak 1.3.

Riak’s active anti-entropy feature is based on hash tree exchange, which enables differences between replicas to be determined with minimal exchange of information. Specifically, the amount of information exchanged in the process is proportional to the differences between two replicas, not the amount of data that they contain. Approximately the same amount of information is exchanged when there are 10 differing keys out of 1 million keys as when there are 10 differing keys out of 10 billion keys. This enables Riak to provide continuous data protection regardless of cluster size.

Additionally, Riak uses persistent, on-disk hash trees rather than purely in-memory trees, a key difference from similar implementations in other products. This allows Riak to maintain anti-entropy information for billions of keys with minimal additional memory usage, as well as allows Riak nodes to be restarted without losing any anti-entropy information. Furthermore, Riak maintains the hash trees in real time, updating the tree as new write requests come in. This reduces the time it takes Riak to detect and repair missing/divergent replicas. For added protection, Riak periodically (default: once a week) clears and regenerates all hash trees from the on-disk K/V data. This enables Riak to detect silent data corruption to the on-disk data arising from bad disks, faulty hardware components, etc.

New Look for Riak Control
Riak Control is a UI for managing and monitoring your Riak cluster. Riak Control lets you start and re-start Riak nodes, view a “health check” for your cluster, see all nodes and their current status, and have visibility into their partitions and services. Riak Control now has a brand new look and feel. Check out the Riak Control Github page to get up and running.

Expanded IPv6 Support
While Riak’s HTTP interface has always supported IPv6, not all of its interfaces have been as current. In Riak 1.3, the protocol buffers interfaces can now listen on IPv6 or IPv4 addresses. Riak handoff (which is responsible for data transfer when nodes are added or removed, and for handing off update responsibilities when nodes fail) also supports IPv6. It should also be noted that community member Tom Lanyon started the work on this feature. Thanks, Tom!

Improved Backpressure in Riak MapReduce
Riak has Javascript and Erlang MapReduce for performing aggregation and analytics tasks. Backpressure is an important aspect of the MapReduce system, keeping processes from being overwhelmed or memory consumption getting out of control. In Riak 1.3, tunable backpressure is extended to the MapReduce sink to prevent these types of problems at endpoint processes.

Riak Enterprise: Advanced Multi-Datacenter Replication Capabilities
With hundreds of companies using Riak Enterprise, a commercial extension of Riak, we’ve been lucky to work with many teams pushing the limits of multi-datacenter replication performance and resiliency. We’ve learned a lot and are excited to announce these capabilities are now available in advanced mode.

  • Previously, multi-datacenter replication had one TCP connection over which data was streamed from one cluster to another. This could create a performance bottleneck, especially when run on nodes constrained by per-instance bandwidth limits, such as in a cloud environment. In the new version of multi-datacenter replication, multiple concurrent TCP connections (approximately one per physical node) and processes are used between sites.
  • Configuration of multi-datacenter replication is easier. Use a shell command to name your clusters, then connect both clusters using a simple ip:port combination.
  • Better per-connection statistics for both full-sync and real-time modes.
  • New ability to tweak full-sync workers per node and per cluster, allowing customers to dial-in performance.

The new replication improvements are already used in production by customers and yielding significant performance improvements. For now, the new replication technology is available in advanced mode: it’s optional to turn on. It currently doesn’t have all of the features of the default mode – including SSL, NAT support and full-sync scheduling. Both default and advanced modes are available in the 1.3 release and function independently. In the future, “advanced mode” will become the default.

For more details about multi-datacenter replication, download our whitepaper, “Multi-Datacenter Replication: A Technical Overview.”

Basho

Hibernum Selects Riak for User Data Storage

February 19, 2013

Hibernum is a creator and developer of unique gaming experiences that combine the latest in social gaming, top quality visuals and animations, and cutting edge design. They use Riak to store user game information for one of their most popular social games.

Hibernum Homepage

Currently, Hibernum’s Riak installation serves thousands of requests per second to more than a million monthly active users. User data is stored in Riak as JSON objects, and Hibernum uses Riak’s HTTP interface, a perfect fit for their Node.js-based application server. As the game grows in popularity, millions of new entries are generated and stored in Riak, as well as any updates or modifications that may occur during gameplay. Mario Lefebvre, IT Specialist at Hibernum, has said that Riak is “managing this load like a charm and is a stable and rock solid solution.”

Originally, Hibernum was using a relational database, however, they found the manual sharding required to scale was operationally intensive and inefficient. They needed something that could better handle their significant growth and started looking for a cost-efficient solution that could support the large amount of requests, as well as a solution that allowed for easy scalability. After testing multiple solutions, Riak was chosen for its high availability, ability to scale to peak loads, and predictable operational cost.

To learn more about how Hibernum uses Riak, check out the complete case study.

Basho