Tag Archives: Riak

Riak on AWS – Technical Guidance

June 27, 2013

Today, we are excited to share a recent whitepaper released by the Amazon team entitled, “NoSQL Database in the Cloud: Riak on AWS.” This paper provides technical guidance on running Riak on the Amazon platform, including an overview of:

  • Basic Installation
  • Riak Architecture and Scale
  • Operational Considerations (including sizing and configuration)
  • AWS specific security configuration
  • A discussion of Replication (as enabled by Riak Enterprise)

Given the number of Riak users (both open source and enterprise) who leverage public cloud environments, either as a part of their infrastructure or as the foundation of it, Basho will continue to invest in partnerships that provide deployment choice and deployment ease. Whether it’s for a hybrid cloud model – used to address burst capacity, tenancy/data locality, and proof of concept needs – or for an investment solely in public cloud, Riak will provide the operational simplicity and scalability required for your critical data.

For more information about deploying Riak on AWS, check out our posts about the Riak AMI and our other deployment options, including automated scripts and manual installation. You can also find more information about what to consider when installing Riak on AWS in our documentation.

Basho

"ZooKeeper for the Skeptical Architect" from RICON East

June 26, 2013

Camille Fournier is the VP of Technical Architecture at Rent the Runway and is an expert in distributed systems and ZooKeeper. She was also one of the speakers at RICON East, Basho’s distributed systems conference. Her talk was entitled, “ZooKeeper for the Skeptical Architect.”

ZooKeeper has become quite ubiquitous, since it’s the core component of the Hadoop ecosystem and enables high availability for systems like Redis and Solr. However, as Camille points out, just because something’s popular, doesn’t mean you should use it. To help you decide whether ZooKeeper is right for you, she goes over the core uses of ZooKeeper in the wild and why it is suited to these use cases. She also talks about systems that don’t use ZooKeeper and why that can be the right decision. Finally, she discusses the common challenges of running ZooKeeper as a service and things to look out for when architecting a deployment. Her full talk is below:

[youtube http://www.youtube.com/watch?v=j4uwKP7WJFk&w=640&h=390]

You can also check out her slide deck here.

If you’re interested in speaking at RICON West (Oct. 29-30th in San Francisco), we are now accepting proposals through July 1st. If you’re interested in attending, you can purchase early bird tickets here.

Basho

Basho’s Partnerships Page

June 5, 2013

To help make Riak even more accessible, we have partnered with a number of different hosting providers, consulting services, system integrators, and OEMs to help you better use Riak. You can check out all of our partners at the Partnerships Page, which highlights how we are collaborating. Below is a look at some of our wonderful partners.

For those of you in need of a hosting provider for Riak, Basho is partnered with a number of great companies to help get you deployed quickly. Partners include: Amazon Web Services, Windows Azure, Joyent, SoftLayer (which was recently acquired by IBM and also offers hosting for Riak Enterprise), and Engine Yard – which is offering 500 free hours when you sign up.

For your infrastructure needs, Basho partners with Citrix and Redapt.

Some companies that are using Riak and Riak CS to power applications or other product offerings include Datapipe and Yahoo! Japan subsidiary, IDC Frontier. ePlus and Trifork act as resellers of Riak to help expand our global reach.

Finally, Gazzang helps to ensure that your Riak environment is secure and meets all regulations related to sensitive information.

Visit our Partnerships Page to learn more about how we partner with these companies. If you’re interested in partnering, please let us know!

Basho

Register for the Intro to Riak Webcast

June 4, 2013

If you’re interested in learning more about Riak, tune in this Friday, June 7th at 11am PT/2pm ET for an Intro to Riak webcast. You can sign up for this 30-minute webcast here.

This webcast will cover:

  • Riak’s architecture, properties, and principles
  • How to build apps with Riak’s key/value data model
  • APIs and client libraries
  • Deploying Riak in the cloud with AWS, Engine Yard, Azure, and more
  • Common use cases for a variety of industries including advertising, retail, and mobile
  • Case studies from users such as Copious, Yammer, Voxer, and OpenX

We will also be available after the webcast to answer any questions that you might have. You can register for the Intro to Riak webcast here.

Basho

Check Out the New Resources Page

May 29, 2013

On our site, you may have noticed a new addition, the Resources Page. On this page, you can download or watch the latest content related to Riak, Riak CS, partnerships, key users, and the most popular verticals (such as Gaming, Retail, Advertising, and Mobile).

If you’re looking for whitepapers, case studies, videos, docs, slides, or webcasts, this is the place to find them. It will be continuously updated with up-to-date resources about Basho. Below is a glimpse at what you can find on the Resources Page.

  • Sign up for our upcoming webcast on May 31st
  • Get Riak CS up and running quickly by following our Fast Track documentation
  • Learn how to launch Riak quickly via our hosting partners: AWS, Azure, Engine Yard, SoftLayer, and Joyent
  • Watch Angry Birds developer, Rovio, speak on how they use Riak
  • Read about how Copious uses Riak for their eCommerce marketplace
  • Learn how to model advertising data using a key/value structure
  • Download the technical introduction to Riak specifically for mobile applications and platforms

We hope you enjoy exploring our new Resources Page! As you’re evaluating Riak as your database solution, remember to contact us with any questions.

Basho

Shopzilla Selects Riak For Core Data Services

May 28, 2013

Shopzilla allows customers to easily find, compare, and buy anything on the web. They reach over 40 million customers each month, connecting them to 100 million products from tens of thousands of retailers. Riak powers both their Keyword and Scrapbook Data Services.

Previously, Shopzilla was using large installations of RDBMS as their primary data platform. This worked well for some use cases – however, it had unnecessary features and was expensive for some services, especially read-heavy use cases with large datasets. For these use cases, they wanted an open source solution that offered deployment on commodity hardware and support for high read and write throughput. With downtime having a direct impact on revenue, availability was a critical factor. Riak was evaluated alongside MongoDB, Redis, and HBase. Ultimately Riak not only fit all of their criteria, but also offered ease of management and operations, allowing their development team to focus on other areas of the business.

Shopzilla uses Riak to store data for its Keyword and Scrapbook Services. The Keyword Service serves metadata about primary keywords and requires real-time access. It stores over 1 billion keywords and can process tens of millions of new keywords at a time. They migrated this service over to a six-node Riak cluster that uses Varnish for caching. Riak is accessed via protocol buffers for high performance and uses the LevelDB backend, best suited to implementations with a very large number of keys. Under moderate load, they see 95% of traffic requests come in under 10 milliseconds.

The Scrapbook Service allows Shopzilla to aggregate product information from different web sources to add supplemental details on their product pages. This means they potentially need to store data equivalent to the scale of their product inventory. Since this product information is accessed via Product ID, it was a perfect fit for Riak’s key/value data model. To provide familiar ad-hoc SQL querying capabilities, they worked with their content team to aggregate and stage this data in an Oracle schema before bringing it into Riak. This allowed them to design a schema based on their needs that would not affect the speed in which Riak serves this data. Most of their online requests are served in well under 5 milliseconds.

Riak has been in production at Shopzilla for over a year now. According to Will Gage, Principal Software Architect at Shopzilla, “I haven’t had to worry about Riak since implementation because we haven’t had any significant problems with it. We’re confident we made the right choice with Riak. It works as it’s supposed to and its stability is great. We’ve watched it work through real-life network partitions under load and recover quickly with no intervention. In short, Riak lets us focus on things other than the database.”

For more information about why Shopzilla chose Riak, check out Gage’s talk from a recent Riak meetup in Santa Monica, below. To learn more about Riak, visit basho.com/riak.

Riak at Shopzilla

Basho

Understanding Riak's Configurable Behaviors Epilogue

May 22, 2013

Basho recently held its second distributed systems conference, RICON East in New York City. Months of preparation led to two days of concentrated learning, with community members from academia and industry sharing where we’ve been and where we’re going.

By design, many of the presentations had little direct relationship to Riak: RICON is a marketplace for ideas, not for product. However, two of the speakers tackled topics I discussed recently in my blog series on the subtleties of Riak configuration.

This is a follow-up to that series to examine those talks. I won’t repeat earlier content in any significant detail.

Rich Hickey, Using Datomic with Riak

Datomic is a very different take on databases, more akin to a version control system than a traditional RDBMS. In Datomic, records (“facts”) are never changed, but rather can be replaced as needed.

The notion of immutable facts leads to a conceptually simple distributed model that allows for transactions: a view into the database is simply a checkpoint of the facts. It’s always possible that a client may be reading an old checkpoint, but the facts at that checkpoint will be consistent regardless of what further updates have been applied.

Riak is one of several backends that can be used with Datomic.

How Datomic queries Riak

Because Datomic keeps a record of all keys in the system, and because the values for those keys never change, reads can be expedited by setting R=1.

However, as you’ll recall, R=1 has an important complication: if the first vnode to respond does not have a copy of that key (perhaps there’s a sloppy quorum in play due to a node failure) the request will “successfully” complete with a notfound message.

This default behavior can be changed by setting notfound_ok=false so that the coordinating node will await an actual value before reporting it back to the client, and in fact this is how Datomic operates.

Kyle Kingsbury, Call Me Maybe: Carly Rae Jepsen and the Perils of Network Partitions

Kyle conducted extensive testing of various distributed databases in the face of network partition. Specifically, he wanted to see whether writes were successful (and properly retained) during and after the partition.

His testing of Riak with allow_mult=false (the default) revealed 91% of writes were lost after the partition healed.

Riak is, however, the only database that retained 100% of writes during a partition, but only when allow_mult was set to true in order to allow sibling resolution on the client side after the partition.

Without allow_mult=true, there is no way (currently) for Riak to resolve conflicting writes other than to accept the last value written.

Important: Riak would also do a perfectly good job of preserving all writes under the Datomic model of creating immutable key/value pairs. It may seem like all databases should handle that scenario properly, but in fact some will throw away all writes on one side of the partition.

Kyle emphasizes what I mentioned in part 1 of this series: if you can’t create immutable objects, and don’t want to handle conflict resolution via the client, CRDTs will allow for automatic resolution in the future, so long as you can make your data fit that model.

Kyle has expanded his talk into a blog series.

RICON

Basho will be hosting two more RICON conferences this year, in San Francisco and London. As was true in New York City in May and San Francisco last fall, the talks will be streamed live over the Internet and would be well worth your time.

However, speaking from personal experience, the talks are just a portion of the overall value offered by RICON. It is difficult to convey the atmosphere during and between sessions, but even the afterparty was replete with technical discussions.

If you’ve not experienced it, you can browse the #riconeast tag at Twitter for a feel for the reactions of those present (and those not) to the RICON experience, and please consider joining us next time.

RICON East videos should be available soon; the album of RICON 2012 videos is recommended.

John R. Daily

Google I/O Panel on Distributed Databases

May 21, 2013

Last week at Google I/O, Google’s popular developer conference, Tyler Hannan of Basho was invited to speak on a panel entitled, “Distributed Databases Panel: An Exploration of Approaches and Best Practices.”

During this talk, Tyler and other panelists discussed how modern distributed databases provide high availability, scalability, and fault-tolerance to protect critical data across all industries. In addition to Tyler, this talk featured Julia Ferraioli (Google Compute Engine), Brian Dorsey (Google Developer Relations), Chris Ramsdale (Google Cloud Platform), Mike Miller (Cloudant), and Will Shulman (MongoLab).

You can watch the entire talk below:

Basho

Blogging Platform EklaBlog Selects Riak for Image, Music, and Document Storage

May 16, 2013

EklaBlog is a popular, easy-to-use blogging platform based out of Nantes, France. They host hundreds of thousands of blogs and see over eight million unique visitors each month. They use Riak to store and serve all of their static files, including images, music and documents.

EklaBlog is a relatively young tech company that started their business alongside other innovative startups at Company Campus, a popular coworking space in France. It wasn’t long before they began to see significant growth in the amount of data hosted on their platform. Like many startups, they began with a single server, instead of a single server with a backup. As they grew, access exceeded the bandwidth available and performance problems began to surface. With more and more users depending on the platform, it became clear their existing infrastructure would be a bottleneck for growth and present availability and performance problems.

EklaBlog needed a solution that would never lose data, scale quickly at low operational cost, and provide an easy-to-use HTTP interface. They also needed to serve files both quickly and predictably, with a consistent low latency profile. All of these attributes were critical to the consumer experience of their platform. After evaluating other solutions, including GlusterFS, MogileFS, and classic file storage, they selected Riak as their primary data store.

“We thoroughly evaluated a number of different solutions before selecting Riak,” said Godefroy de Compreignac, CEO at EklaBlog. “Now we are absolutely convinced that Riak is the most reliable and cost-efficient solution for us. It’s the perfect fit for our needs.”

They launched with Riak in the beginning of 2013 and quickly scaled their cluster up from four to five nodes, each with 1.8TB of usable storage. They use Bitcask, Riak’s default low-level storage backend. Bitcask’s write-once, append-only nature enables very high throughput and low latency. Eklablog currently stores 5.8TB of data in Riak. They add about 15GB of new data each day, serving hundreds of queries per second.

For Eklablog, Riak’s key/value data model provides a simple structure that’s well suited to serving large volumes of image, music, and document files. Eklablog provides three specific platforms, and uses a Riak bucket for each. Files are stored with a unique key generated via a hash function.

You can learn more about EklaBlog at EklaBlog.com and visit Basho.com/Riak for more information about Riak.

Basho

Riak 1.3.1 Now Available on AWS Marketplace

May 6, 2013

The free Riak AMI available on the AWS Marketplace has been updated to the latest version, Riak 1.3.1.

In Riak 1.3, we introduced:

  • Active Anti-Entropy
  • Updates to Riak Control
  • Expanded IPv6 support
  • Improved MapReduce
  • Simplified Log Management

Riak 1.3.1 includes all these features with some additional changes enumerated in the release notes.

For those of you currently using Riak on AWS, or interested in testing Riak on AWS, the AMI makes installation and configuration much easier. We see open source and Riak Enterprise users leverage AWS both as their primary infrastructure and to support hybrid implementations.

Installation instructions for the AMI are available on in our docs.

Basho