Tag Archives: Riak Enterprise

Top Five Questions About Riak CS

May 1, 2013

This post looks at five commonly asked questions about Riak CS – simple, available, open source storage built on top of Riak. For more information, please review our full documentation, or sign up for an intro to Riak CS webcast on Friday, May 10.

What is the relationship between Riak and Riak CS?

Riak CS is built on top of Riak, exposing higher-level storage functions including large object support, an S3-compatible API, multi-tenancy, and per-user storage and access statistics. Riak itself provides the replication, availability, fault-tolerance, and underlying storage functions for the Riak CS implementation. Riak and Riak CS should both be installed on every node in your cluster. While Riak and Riak CS could be run on separate virtual or physical nodes, running them on the same machine minimizes intra-cluster bandwidth usage and is the recommended approach. As with Riak, we advise a minimum 5-node cluster.

When objects are uploaded to Riak CS, the object is broken up into smaller chunks which are then streamed, stored, and replicated in the underlying cluster. A manifest is maintained for each object, that points to which blocks comprise the object, and is used to retrieve all blocks and present them to the client on read. In addition to running Riak and Riak CS on each node, Stanchion, a request serializer, must be installed on at least one node in the cluster. This ensures that global entities, such as users and buckets, are unique in the system.

What use cases does Riak CS support that Riak doesn’t?

Riak CS has several features that are not provided in the standalone Riak database. One of the most obvious differences is in the size of objects supported. Riak CS exposes large object support, and includes multi-part upload so you can upload objects as a series of parts. This allows you to upload single objects to the system into the terabyte range. In Riak, the data model is simply key/value; in Riak CS, the key/value model provides the underlying structure for higher-level storage semantics – users, buckets and objects. The Riak CS interface is an S3-compatible HTTP API, allowing you to use existing S3 libraries and tools. In contrast, Riak exposes an HTTP and protobufs API and offers many language-specific clients. Unlike Riak, Riak CS is multi-tenant, with the concept of “users” and per-user reporting on storage and access. This makes it a fit for both private cloud scenarios, with multiple internal users, or as a foundation for a public cloud storage offering.

How does multi-tenancy, authentication and reporting work?

Riak CS exposes an interface for user creation, disablement and credential management. Riak CS can be set so that only administrators can create new users. Administrators also have special privileges including being able to retrieve a list of all users in the system and query the user account information of any user. Once issued credentials, users are able to authenticate, create buckets, upload and download files, retrieve account information, obtain new credentials, or disable their account through the API. Riak CS supports the standard S3 authentication scheme, with support for header and query string authorization.

Riak CS exposes storage, usage and network statistics that support use cases like accounting, subscription, billing or multi-group utilization for public or private clouds. Riak CS will report information on how much storage a user is consuming and the network operations related to access. This data is exposed via an HTTP interface and can be queried on the default timespan “now” or as a range from start time through end time. Access statistics are reported as bytes in and bytes out for both object and bucket operations. Reporting of this information can be scheduled for a set interval or manually triggered.

What’s the difference between Riak CS and Riak CS Enterprise?

Riak CS Enterprise provides multi-datacenter replication on top of Riak CS. For multi-datacenter replication in Riak CS, global information for users, bucket information and manifests are streamed in real-time from a primary implementation to a secondary site so global state is maintained across locations. Objects can then be replicated in either full sync or real-time sync mode. The secondary site will replicate the object as in normal operations. Additional datacenters can be added in order to create availability zones or provide additional data redundancy and locality. Riak CS Enterprise can also be configured for bi-directional replication. Riak CS Enterprise also comes with 24/7, enterprise-level support. More information and pricing can be found here, and full technical information is available on our docs portal. Ready to get started? Sign up for a developer trial of Riak CS Enterprise.

What are your plans for integration of Riak CS with open source compute solutions?

Riak CS provides highly available, distributed storage, making it a natural fit for usage alongside compute solutions. We have partnered with Citrix to collaborate on the integration of Apache CloudStack and Riak CS to create a complete cloud software offering that combines compute and storage in an integrated platform. For more information on our partnership with CloudStack, check out this blog post with the latest update. API and authentication support for OpenStack is also in progress.

Ready to get started? You can download Riak CS here, and check out the Riak CS Fast Track for a hands-on getting started guide.

Riak Available On SoftLayer Platform

April 30, 2013

Today we are pleased to announce the availability of Riak and Riak Enterprise on SoftLayer’s global cloud platform. Users can now easily configure and deploy Riak environments on the SoftLayer platform with a flexible, pay-as-you-go service model. The solution makes it easy for organizations to quickly deploy scalable production-grade Riak systems on-demand. The partnership accelerates the speed of developing and launching applications with Riak, provides ease of operations with scale, and enables global multi-datacenter replication.

Features of the joint offering include:

  • Web-based tool to configure and deploy Riak on demand and at the click of a button
  • Pay-as-you-go model providing the flexibility of monthly or annual billing and no long-term contracts
  • Rapid deployment on dedicated, bare-metal servers for optimum performance

With the Riak Enterprise offering on SoftLayer, users can replicate data stored in Riak across SoftLayer’s global infrastructure. This multi-datacenter replication capability provides data locality, disaster recovery, global load balancing, and active backups. SoftLayer’s global private network allows for high-speed, secure replication between clusters.

The integrated solution provides the availability, fault tolerance, operational simplicity, and scalability of Riak combined with the flexibility, performance, and agility of SoftLayer’s on-demand infrastructure.

Bump is one of the most popular mobile apps on the market today, and is already using Riak on the SoftLayer platform. “Operational ease is key to our business success,” says Mark Smith, Operations Lead at Bump. “The combination of SoftLayer, who we already trust with our business and data, and Basho, who makes the database that we trust at scale, saves us time and effort and allows us to focus on our business, not our data infrastructure.”

For more information on how Bump uses Riak, check out the case study. For more information on Riak Enterprise, visit the product page or documentation.

SoftLayer is also sponsoring the RICON East 2013 after party. On night one of the conference, we’re renting out Hudson Terrace for a one-of-a-kind party. SoftLayer and Basho are furnishing drinks, food, and entertainment. All RICON attendees are automatically registered for the party…but, as of today, the party is open to anyone who wants to register.

Multi-Datacenter Replication: Availability Zones and Public Cloud

February 28, 2013

In the last post, we looked at how Riak Enterprise’s multi-datacenter replication can be configured for backups and data locality. In this post, we examine two other common implementations: availability zones and public cloud use cases. For more information on Riak Enterprise architecture and configuration, download the complete whitepaper.

Availability Zones

Availability zones provide efficient multi-datacenter replication and data redundancy within a geographic region (such as a coast or a country). In this configuration, data is replicated within an availability zone’s series of datacenters. In the event that one of datacenters experiences an outage or serious failure, data can still be served from other datacenters within the same region.

One approach to setting this up is to have a “primary” site in a region where all reads and writes for specific users, applications, or data sets are directed. This primary cluster can then be replicated to one or more proximal secondary clusters. In other approaches, data can be replicated in real-time from one cluster to both another datacenter and other cold backups maintained for emergency conditions. The right approach is highly dependent on the requirements of users, availability, expense of bandwidth, and other constraints.

Public Cloud Use Cases

Riak is designed to be easy to use and operate on public clouds, and is partnered with many of the leading cloud providers, including Amazon Web Services, Microsoft Azure, and Joyent. Hosted Riak is also available from Engine Yard and Riak packages can always be manually installed on any physical or virtual provider, even if a machine image isn’t explicitly supported.

There are several use cases for Riak Enterprise’s multi-datacenter replication in the public cloud. Many enterprises want to maintain a cold or hot backup of their cluster in a public cloud for business continuity in the event of a datacenter outage in their private infrastructure. For other customers, the public cloud can provide a more cost-effective way of meeting peak loads, rather than building out private infrastructure to accommodate them year-round. For example, many retailers and media providers need to offer increased capacity over the holiday season. Riak Enterprise is used to scale out capacity on public clouds over these periods, either with full-sync or real-time sync depending on the business needs.

Finally, some enterprises run certain applications or services entirely on public clouds. Riak Enterprise allows for redundancy and data locality across public cloud availability zones for this use case, ensuring optimal performance and resiliency.

For a more in-depth look at common architectures and use cases for Riak Enterprise, download our technical overview. You can also sign up for our webcast on Thursday, March 7th.

Basho

Multi-Datacenter Replication: Backups and Data Locality

February 27, 2013

Multi-datacenter replication is a critical part of modern infrastructure, providing essential business benefits for enterprise applications, platforms and services. Riak Enterprise offers multi-datacenter replication so that data stored in Riak can be replicated to multiple sites. Over the next two posts, we will look at some common implementations, starting with configurations for backups and data locality. For more information on Riak Enterprise architecture and configuration, download the complete whitepaper.

Primary Cluster with Failover

One of the most common architectural patterns in multi-datacenter replication is maintaining a primary cluster that serves traffic and a backup cluster for emergency failover. This configuration can be an important component of compliance with regulatory requirements, ensuring business continuity and access to data even in serious failure modes.

In this configuration, a primary cluster serves as the production cluster from which all read and write operations are served. The backup cluster(s) is maintained in another datacenter. In the event of a datacenter outage or critical failure at the primary site, requests can be directed to the backup cluster either by changing DNS configuration or rules for routing via a load balancer.

For this use case, we recommend that your failover strategy be tested periodically so any potential issues can be resolved in advance of a crisis. It’s also beneficial to have your failover strategy fully defined upfront – know the conditions in which a failover mode will be invoked, decide how traffic will be directed to the backup, and document and test the failover strategy to ensure success.

Active-Active Cluster Configuration

To achieve data locality, when clients are served at low-latency by whatever datacenter is nearest to them, you can maintain two (or more) active, synced clusters that are both responsible for serving data to clients. An added benefit of this approach is that in the event of a datacenter failure where one of the clusters is hosted, all traffic can be served from the other, still-functional site for business continuity.

For data locality, requests can be load balanced across geographies, with geo-based client requests directed to the appropriate datacenter. For example, US-based requests can be served out of a US-based datacenter while EU-based requests can be served out of a regional site. For situations where not all data needs to be shared across all datacenters (or if certain data, such as user data, must only be stored in a specific geographic region to meet privacy regulations), Riak Enterprise’s multi-datacenter replication can be configured on a per-bucket basis so only shared assets, popular assets, etc. are replicated.

For a more in-depth look at common architectures and use cases for Riak Enterprise, download our technical overview. You can also sign up for our webcast on Thursday, March 7th.

Basho

Advanced Mode Now Available for Riak Enterprise’s Multi-Datacenter Replication

February 25, 2013

This post takes an in-depth look at Riak Enterprise’s new multi-datacenter replication capabilities, available in the recent 1.3 release. Riak Enterprise’s multi-datacenter replication now ships with “advanced mode,” which features some performance and resiliency improvements that we’ve developed by working with production customers:

  • Instead of having only one TCP connection over which data is streamed from one cluster to another, this new version features multiple concurrent TCP connections (approximately one per physical node) and processes are used between sites. This prevents against possible performance bottlenecks, which can be especially common when run on nodes constrained by per-instance bandwidth limits (such as in a cloud environment).
  • Easier configuration of multi-datacenter replication. Simply use a shell command to name your clusters, then connect both clusters using an ip:port combination.
  • Better per-connection statistics for both full-sync and real-time modes.
  • New ability to tweak full-sync workers per node and per cluster, which allows customers to dial-in performance.

Details of the advanced mode capabilities are below. For more about the multi-datacenter replication upgrades and our 1.3 release, check out this recent article from GigaOM. For full technical details, check out our documentation on multi-datacenter replication. For an examination of common architectures and use cases for Riak Enterprise, including datacenter failover, active-active cluster configurations, availability zones, and cloud bursting, download our technical overview.

The new advanced mode of Riak Enterprise’s multi-datacenter replication takes the best features from the past single channel communications model and scales it up to one-to-one connections between peer nodes of clusters. With concurrent channels and the ability to constrain the maximum connections per node and per cluster, the new multi-datacenter replication allows the full capacity of the network and cluster size to scale the performance to available resources.

Simple Configuration
It begins with a much easier configuration command language and environment, with natural objects as sources, sinks, and cluster names. For example, real-time and full-sync enable/disable, start/stop, and status all use these human friendly symbols. All of the connections go through a single port, reducing network administration to a single firewall port forwarding. Riak then manages the different protocols on this port. Connections are persistent, resilient to outages, and adapt to changes in cluster names and IP addresses automatically.

Two Sync Modes
Real-time synchronization between clusters uses a new queueing mechanism that ensures maximum performance and graceful shutdown of nodes. This guarantees that there is no loss of replication data during upgrades or scheduled maintenance. It also automatically balances the load across all nodes of both the source and sink clusters and requires no configuration.

Full-synchronization between clusters uses a scheduling algorithm to maximize the transfer rate of data between peer nodes of the two clusters. Partitions are synchronized in parallel so that the maximum number of keys can be updated concurrently with the minimum overlap, which minimizes load and contention on both the source and sink clusters. Full-sync supports concurrent syncs between multiple clusters and optimizes the load dynamically, ensuring nodes never exceed their configured connectivity. This allows clusters to synchronize at maximum efficiency, without impacting their runtime performance for connected clients as they make PUT and GET requests.

We are also planning to include Secure Sockets Layer and Network Address Translation support in this advanced mode of multi-datacenter replication – it is currently only available in default mode. Additionally, future improvements will take advantage of the Active Anti-Entropy (that was introduced in Riak 1.3) to make full-sync differencing even faster. Stay tuned for more updates!

To learn more about Riak 1.3 and the new advanced mode for multi-datacenter replication, sign up for our webcast on Thursday, March 7th.

Basho

Multi-Data Center Replication in Riak Enterprise 1.2

August 8, 2012

The Replication team @Basho has been hard at work implementing new features for Multi-Data Center (MDC) Replication. These new features are the direct result of customer feedback, and are included in the release of Riak Enterprise 1.2. Riak Enterprise documentation is also now publicly available for the first time.

What is MDC Replication?

Replication is a tool available in Riak Enterprise that allows data to be copied between Riak clusters. Data can be copied on initial connection to a remote cluster, in realtime as a bucket is updated, or as a periodic full-synchronization. Although replication is uni-directional, remote clusters can be setup to replicate data back to a primary cluster, thus synchronizing bi-directionally.

These settings are all configurable along side other Riak settings in app.config, and by using the Riak Enterprise command line tool riak-repl (in your Riak Enterprise ./bin directory).

What’s new?

SSL

As replicating sensitive data over the internet isn’t safe, we now provide encryption via OpenSSL out of the box. Certificates signed by a standard Certificate Authority (CA) such as Verisign are supported, as well as self-signed certs.

Certificate chains can be validated down to the CA, but both certificates must resolve to the same root CA. Additionally, you can configure the number of intermediate CA’s allowed. Certificate common name whitelisting is also supported.

An example of enabling SSL is as easy as specifying these 4 parameters to the riak_repl section of app.config:

bash
{ssl_enabled, true},
{certfile, "/full/path/to/site1-cert.pem"},
{keyfile, "/full/path/to/site1-key.pem"},
{cacertdir, "/full/path/to/cacertsdir"}

Additional SSL configuration parameters are documented in the forthcoming Riak Enterprise Replication Operations Guide.

Per-bucket replication settings

Per-bucket replication allows for more granular control of exactly what and how things get replicated. Using this feature is as easy as setting a bucket property. Supported per-bucket replication schemes are: realtime only, full-sync only, both realtime + full-sync, and no replication.

For example, to entirely disable replication on a bucket titled “my_bucket”:

bash
curl -v -X PUT -H "Content-Type: application/json" -d '{"props":{"repl":false}}' http://127.0.0.1:8091/riak/my_bucket

The following example only replicates data during a full-sync (skipping real-time replication) on a bucket titled “my_bucket”:

bash
curl -v -X PUT -H "Content-Type: application/json" -d '{"props":{"repl":"fullsync"}}' http://127.0.0.1:8091/riak/my_bucket

These parameters are documented in the forthcoming Riak Enterprise Replication Operations Guide.

Extensive documentation updates

We are excited to be releasing new and improved Riak Enterprise documentation in v1.2. This documentation is now available publicly on the Riak wiki. Additional settings have been documented which allow for greater control of replication behavior.

Support for replication over NAT

It’s typical to see Network Address Translation (NAT) in an enterprise environment, so support has been added to make this easier for our customers to use. Combining SSL + replication over NAT should take care of securely copying data over the internet.

The new command:

bash
riak-repl add-nat-listener <nodename> <internal_ip> <internal_port> <nat_ip> <nat_port>

will allow the primary cluster (aka “listener”) to replicate data on both an internal IP/port and public IP/port.

Replication over NAT Example

Server A is the primary source of replicated data.

Server B and Server C would like to be clients of replicated data.

To configure this scenario:

Server A is setup with static NAT, configured for IP addresses:

192.168.1.10 (internal) and 50.16.238.123 (public)
Server A replication will listen on:

  • the internal IP address 192.168.1.10, port 9010
  • the public IP address 50.16.238.123, port 9011

Server B is setup with a single public IP address: 50.16.238.200

Server B replication will connect as a client to the public IP address 50.16.238.123, port 9011
Server C is setup with a single internal IP address: 192.168.1.20

Server C replication will connect as a client to the internal IP address of 192.168.1.10, port 9010

Configure a listener (replication server) on Server A:

bash
riak-repl add-nat-listener riak@192.168.1.10 192.168.1.10 9010 50.16.238.123 9011

Configure a site (replication client) on Server B

bash
riak-repl add-site 50.16.238.123 9011 server_a_to_b

Configure a site (replication client) on Server C

bash
riak-repl add-site 192.168.1.10 9010 server_a_to_c


To summarize, we hope that SSL, replication over NAT, per-bucket replication settings and updated documentation will allow for better control of Riak MDC Replication in your enterprise installation.

Thanks for reading!

Dave, Andrew, and Chris

Don’t look to Big Money to fund innovation. Innovators must look to each other.

December 9, 2010

One thing that the last year has taught us is that innovation will not be constrained by an economy in the doldrums. People have big ideas and big ideas play, no matter the economy.

Ever since we started talking to select companies about Riak in early 2009, we have been overwhelmed by the creative ideas for how to put a distributed data store into production.

Flash-based ad serving, real-time search, network analytics, and single-source/multi-lingual content are just a few examples of applications that are, or have the potential to start, transforming their existing economies.

We have had a unique view into emerging ideas and we are convinced of one thing: if these companies want to use Riak, who cares how much they can pay now? Their ideas are big and they will make us better. Many are already pounding the heck out of Riak, which, not coincidentally, means their businesses are taking off.

So that is the real reason why Riak Enterprise for Startups came about. Whether or not any of these companies become the next Comscore or Doubleclick doesn’t matter. Cedexis, Teleskele, even a few stealth startups — these people are smart and driven and their ideas are big. They push Riak to its limits and make us better.

The economics of the Riak EnterpriseDS for Startups program are in the end quite simple: we give you code, you push us to be better. If you like us, we ask that you share that opinion. If not, say what you will. We deserve it. But either way, we will do our damnedest to make sure you get the best code and the best support we can deliver.

Why? Because we know what it is like to passionately believe in an idea and find folks like Bob and Jameson at Mochi Media, Marty at Cedexis, Gohkan at Teleskele, and Tom Fredell who believe in us.

Don’t look to Big Money to fund innovation nowadays. Big Money is scared. Look to other innovators and entrepreneurs. Look to each other.

Let us know if we can help.

Earl

Basho Technologies Launches Riak EnterpriseDS for Startups Program

Revolutionary “pay what you can” pricing model for start-up companies with no institutional funding

CAMBRIDGE, MA – December 9, 2009 – Basho Technologies, Inc., a provider of distributed data store and analytics software, today launched its Riak EnterpriseDS for Startups program with three innovative startup companies: Cedexis, Teleskele, and a stealth Massachusetts-based startup headed by Thomas Fredell, former CTO of IntraLinks, Inc.

Designed to meet the unique needs of early-stage, innovative companies with advanced analytical, data replication, and support needs but who are self-funded or in the pre-institutional funding phase, the program provides qualified clients with licenses to Riak EnterpriseDS on a pay-what-you-can basis.

“The deal is simple: pay what you can, get great code and better service. When we are in a position to pay more we will,” said Marty Kagan, CEO and Co-founder of Cedexis, which provides a collective intelligence platform for next-generation web traffic management strategies.

Riak EnterpriseDS for Startups customers typically have three qualifications in common: they are incorporated as a business, have not received institutional funding, and are willing to participate in case studies, feature design meetings, and serve as references.

“When we say pay-what-you-can, we mean it,” says Earl Galleher, Basho’s CEO. “With this program we hope more interesting Internet startups will get to market with world class infrastructure that allows them to scale cost-effectively.”

“We needed to demonstrate to early customers that we could scale,” said Gokhan Bayraktar, General Manager of Teleskele, which provides an international, multilingual, globally distributed, content-driven ad network. “Basho didn’t ask how much we could pay. They asked us what we wanted to build and how could we build it together.”

“Without Riak, I would not be able to execute this business nearly as well,” said Thomas Fredell, CTO and co-founder of a Boston-based startup in stealth mode and previously CTO of IntraLinks. “We need to analyze and query a huge body of data as it collects and with very low latency. The simple scalability and distributed MapReduce features of Riak make that possible.”

“Venture capital funding has slowed significantly and early stage companies face difficulties getting the resources needed to get their innovative products to market,” said Mr. Galleher. “Hopefully, this leads to more jobs as these companies grow and succeed. Our reward for offering this program will come over time.”

To learn more about the Riak EnterpriseDS for Startups program you can visit the Riak Enterprise page or email startups@basho.com.

About Basho Technologies

Basho Technologies, Inc., founded in January 2008 by a core group of software architects, engineers and executive leadership from Akamai Technologies, Inc. (Nasdaq: AKAM) is headquartered in Cambridge, Massachusetts. Basho produces Riak, a distributed data store that combines high availability, easily-scalable capacity and throughput, and ease of use. Riak’s high availability data store means that applications built using Riak remain both read and write available under almost any operational conditions and without requiring intervention. Available in both an open source and a paid commercial version, Riak provides unprecedented read- and write-availability to web, mobile, and enterprise applications.

Media Contacts
Earl Galleher
CEO, Basho Technologies, Inc.
910.520.5466
earl@basho.com