Tag Archives: Riak

SoftLayer & Basho Partner Announcement

SoftLayer & Basho Partner for High-Performance, Scalable Riak In The Cloud
Turnkey Big Data Environments Available Across SoftLayer’s Global Infrastructure

Dallas, TX and Cambridge, MA — April 30, 2013 — Basho and SoftLayer Technologies today announced the availability of Riak and Riak Enterprise on SoftLayer’s global cloud platform. The integrated solution provides the availability, fault tolerance, operational simplicity, and scalability of Riak combined with the flexibility, performance, and agility of SoftLayer’s on-demand infrastructure.

SoftLayer and Basho have collaborated to make Riak—an open source, distributed database—deployment more accessible and flexible through a pay-as-you-go service model. The solution enables organizations to swiftly deploy scalable production-grade systems, accelerating the speed of deployment of big data applications and providing greater business agility. Organizations can design and deploy a complete solution set through SoftLayer’s Web-based Solution Designer, with ongoing management and provisioning available via the company’s portal, mobile apps and API.

Common use cases for Riak include fault-tolerant, low-latency storage for content, user and session information, mobile data, log files, JSON/XML documents, and more. For customers that require replication of clusters between multiple data centers, Riak Enterprise is available and also adds extended monitoring and 24×7 support. Customers use multi-datacenter replication, in two or more sites, to serve global traffic, maintain active backups, run secondary analytics clusters, or meet disaster recovery and regulatory compliance requirements.

“Customer demand for easy-to-deploy and manage big data cloud-based solutions continues to rise,” says Duke Skarda, CTO of SoftLayer. “We are seeing substantial adoption through joint customers, such as Bump. Our platform was built specifically to support the kind of Web-scale distributed applications that big data exemplifies, and our partnership with Basho is further validation of our commitment to deliver a complete suite of scalable, high-performance big data solutions.”

“Basho and SoftLayer have long catered to innovative developers building the next generation of web, social and mobile applications. Today, enterprise customers are demanding the same, an architecture that provides for zero-data loss and delivers zero-downtime,” says Bobby Patrick, executive vice president and CMO of Basho. “We believe distributed systems software, such as Riak, and distributed infrastructure is required to help customers truly achieve these ambitions. Basho is excited to partner with SoftLayer to help companies easily deploy applications that are truly distributed, scalable, and always available.”

Bump is one of the most popular mobile apps on the market today. The app makes it easy for users to share their contact information, photos, and other objects by simply “bumping” their smartphones. Bump uses Riak to store user data including events, communications sent and received, handset information and authentication tokens.

“Operational ease is key to our business success,” says Mark Smith, Operations Lead at Bump. “The combination of SoftLayer, who we already trust with our business and data, and Basho, who makes the database that we trust at scale, saves us time and effort and allows us to focus on our business, not our data infrastructure.”

Features & Benefits

  • Web-based SoftLayer Solution Designer makes it easy to configure and deploy Riak environments on demand and at the click of a button.
  • High performance and superior availability and scalability leveraging the broadest cloud infrastructure platform in the industry including dedicated bare metal servers and a broad range of storage options.
  • Global private network allows for high-speed, secure replication between clusters
    Optimized infrastructure and best-practice deployments based on joint insights, expertise, and experience from SoftLayer and Basho.
  • Pay-as-you-go model provides the flexibility of monthly or annual billing and no long-term contracts.

Riak Available On SoftLayer Platform

April 30, 2013

Today we are pleased to announce the availability of Riak and Riak Enterprise on SoftLayer’s global cloud platform. Users can now easily configure and deploy Riak environments on the SoftLayer platform with a flexible, pay-as-you-go service model. The solution makes it easy for organizations to quickly deploy scalable production-grade Riak systems on-demand. The partnership accelerates the speed of developing and launching applications with Riak, provides ease of operations with scale, and enables global multi-datacenter replication.

Features of the joint offering include:

  • Web-based tool to configure and deploy Riak on demand and at the click of a button
  • Pay-as-you-go model providing the flexibility of monthly or annual billing and no long-term contracts
  • Rapid deployment on dedicated, bare-metal servers for optimum performance

With the Riak Enterprise offering on SoftLayer, users can replicate data stored in Riak across SoftLayer’s global infrastructure. This multi-datacenter replication capability provides data locality, disaster recovery, global load balancing, and active backups. SoftLayer’s global private network allows for high-speed, secure replication between clusters.

The integrated solution provides the availability, fault tolerance, operational simplicity, and scalability of Riak combined with the flexibility, performance, and agility of SoftLayer’s on-demand infrastructure.

Bump is one of the most popular mobile apps on the market today, and is already using Riak on the SoftLayer platform. “Operational ease is key to our business success,” says Mark Smith, Operations Lead at Bump. “The combination of SoftLayer, who we already trust with our business and data, and Basho, who makes the database that we trust at scale, saves us time and effort and allows us to focus on our business, not our data infrastructure.”

For more information on how Bump uses Riak, check out the case study. For more information on Riak Enterprise, visit the product page or documentation.

SoftLayer is also sponsoring the RICON East 2013 after party. On night one of the conference, we’re renting out Hudson Terrace for a one-of-a-kind party. SoftLayer and Basho are furnishing drinks, food, and entertainment. All RICON attendees are automatically registered for the party…but, as of today, the party is open to anyone who wants to register.

Riak at Qeep, the Global Social Network

April 25, 2013

Qeep is a global social network, with more than 19 million registered users sending nearly five million messages each day. Qeep allows you to play games, chat with friends, send pictures, and more. They use Riak to store all user chat messages.

Qeep Home Page

Qeep was founded in 2006. As they started to grow, they realized their single-instance relational database was not going to work for them anymore. Sharding was becoming a significant operational burden and high latency was preventing quick access to the users’ messages. Qeep needed a new solution that could better handle their significant growth.

They started evaluating a number of different NoSQL solutions. With over one billion keys to store and no complex querying requirements, a key/value store would provide straightforward access to user data. Ultimately, Qeep selected Riak due to its high availability, ease of scale, and predictable operational cost. Once chosen, they quickly migrated over 1.8 billion entries of legacy data from their previous relational database installation over to Riak.

Currently, Qeep uses Riak to store all user chat messages. These messages are stored as JSON objects and are accessed using the open source Riak Java Client, offering high performance access for fetching this data. Qeep uses additional structs to model aggregation of data like inbox, outbox and chats between two users. Qeep has made some changes to the Java client to meet their unique caching needs, which can be reviewed on Github.

Qeep has a 12-node cluster with 48GB of RAM per node, connected via a 1GB network. Qeep uses Riak’s LevelDB backend, which is ideal for deployments with a very large number of keys.

“We love the simplicity of administration and scaling that Riak offers,” said Ingo Rockel, Engineer at Blue Lion Mobile, “We have seen huge performance gains since switching to Riak, which has been the biggest plus for us, and has been the basis for planning future feature releases.”

For more information on Riak, sign up for our introductory webcast on Wednesday, May 1 or check out the Riak documentation.


Riak at Shopzilla

April 24, 2013

Will Gage of Shopzilla presented last week on their production Riak usage at the Santa Monica Java Users’ Group. Gage, a member of the Consumer Site Engineering team, shared details on how they built various user-facing services on Riak, why it was the right tool for the job, and when you might want to use it in production. Will’s talk starts at the 49 minute mark in the video embedded below, and it’s well worth your time. In addition to offering details on data modeling for their specific use cases, he also talks about service latencies for their production applications and how the Riak community played an important role in their decision.

Mark Phillips, Basho’s Director of Technical Evangelism, also presented. His talk starts at approximately the 1:20:00 point and is entitled Riak and the Power of Distributed Systems. An excellent complement to Will’s talk, this covers Riak’s architecture at a high level, how to access it as a developer, and then ends with a few use case discussions.

If you’re interested in more talks on Riak in production and the future of Riak, make sure to grab a ticket for RICON East, happening May 13-14 in New York City. This will be two days of talks, parties, and hacking dedicated to Riak, developers, and the future of distributed systems in production.

The Basho Team

Riak at Shopzilla

Upcoming Basho Events – April

April 23, 2013

During the rest of April, Basho will be speaking and sponsoring events around the United States and internationally. If you want to meet up with a Basho team member at one of these events, contact us to set up a time, or send us a note on Twitter. Below are some of the highlights:

NY Tech Day: Basho will be exhibiting at NY Tech Day (April 25) in New York, a massive science fair where entrepreneurs can exhibit their startups to thousands of consumers, investors, first adopters, job seekers, major companies, press and media.

NoSQL Matters: Basho is sponsoring NoSQL Matters (April 26-27) in Cologne, Germany. Additionally, Basho engineers, Sean Cribbs and Eric Redmond, will be speaking about Riak Technologies.

RailsConf: Basho will be attending RailsConf (April 29-May 2) in Portland. It is the largest gathering of Rails developers (and most of the time, Rubyists) in the world, drawing world-class developers and companies together to see the state of the art in Rails and web development.

Meetups: This month, we are hosting meetups in Atlanta at Atlanta Tech Village on April 23rd and in Portland on the 29th at NedSpace.

Sponsored Events: Basho will be sponsoring Railsberry in Krakow, Poland (April 23-24), GOTO Chicago in Chicago (April 23-24), and ChefConf in San Francisco (April 24-26).

We hope to see you at one of these events! For a full list of events this month and in upcoming months, visit our Events Page.


Top Five Questions About Riak

April 17, 2013

This post looks at five commonly asked questions about Riak. For more questions and answers, check out our Riak FAQ.

What hardware should I use with Riak?

Riak is designed to be run on commodity hardware and is run in production on a variety of different server types on both private and public infrastructure. However, there are several key considerations when choosing the right infrastructure for your Riak deployment.

RAM is one of the most important factors – RAM availability directly affects what Riak backend you should use (see question below), and is also required for complex MapReduce queries. In terms of disk space, Riak automatically replicates data according to a configurable n_val. A bucket-level property that defaults to 3, n_val determines how many copies of each object will be stored, and provides the inherent redundancy underlying Riak’s fault-tolerance and high availability. Your hardware choice should take into consideration how many objects you plan to store and the replication factor, however, Riak is designed for horizontal scale and lets you easily add capacity by joining additional nodes to your cluster. Additional factors that might affect choice of hardware include IO capacity, especially for heavy write loads, and intra-cluster bandwidth. For additional factors in capacity planning, check out our documentation on cluster capacity planning.

Riak is explicitly supported on several cloud infrastructure providers. Basho provides free Riak AMIs for use on AWS. We recommend using large, extra large, and cluster compute instance types on Amazon EC2 for optimal performance. Learn more in our documentation on performance tuning for AWS. Engine Yard provides hosted Riak solutions, and we also offer virtual machine images for the Microsoft VM Depot.

What backend is best for my application?

Riak offers several different storage backends to support use cases with different operational profiles. Bitcask and LevelDB are the most commonly used backends.

Bitcask was developed in-house at Basho to offer extremely fast read/write performance and high throughput. Bitcask is the default storage engine for Riak and ships with it. Bitcask uses an in-memory hash-table of all keys you write to Riak, which points directly to the on-disk location of the value. The direct lookup from memory means Bitcask never uses more than one disk seek to read data. Writes are also very fast with Bitcask’s write-once, append-only design. Bitcask also offers benefits like easier backups and fast crash recovery. The inherent limitation is that your system must have enough memory to contain your entire keyspace, with room for a few other operational components. However, unless you have an extremely large number of keys, Bitcask fits many datasets. Visit our documentation for more details on Bitcask, and use the Bitcask Capacity Calculator to assist you with sizing your cluster.

LevelDB is an open-source, on-disk key-value store from Google. Basho maintains a version of LevelDB tuned specifically for Riak. LevelDB doesn’t have Bitcask’s memory constraints around keyspace size, and thus is ideal for deployments with a very large number of keys. In addition to this advantage, LevelDB uses Google Snappy data compression, which provides particular efficiency for text data like raw text, Base64, JSON, HTML, etc. To use LevelDB with Riak, you must the change the storage backend variable in the app.config file. You can find more details on LevelDB here.

Riak also offers a Memory storage backend that does not persist data and is used simply for testing or small amounts of transient state. You can also run multiple backends within a single Riak instance, which is useful if you want to use different backends for different Riak buckets or use a different storage configuration for some buckets. For in-depth information on Riak’s storage backends, see our documentation on choosing a backend.

How do I model data using Riak’s key/value design?

Riak uses a key/value design to store data. Key/value pairs comprise objects, which are stored in buckets. Buckets are flat namespaces with some configurable properties, such as the replication factor. One frequent question we get is how to build applications using the key/value scheme. The unique needs of your application should be taken into account when structuring it, but here are some common approaches to typical use cases. Note that Riak is content-agnostic, so values can be any content type.

Data Type Key Value
Session User/Session ID Session Data
Content Title, Integer Document, Image, Post, Video, Text, JSON/HTML, etc.
Advertising Campaign ID Ad Content
Logs Date Log File
Sensor Date, Date/Time Sensor Updates
User Data Login, Email, UUID User Attributes

For more comprehensive information on building applications with Riak’s key/value design, view the use cases section of our documentation.

What other options, besides strict key/value access, are there for querying Riak?

Most operations done with Riak will be reading and writing key/value pairs to Riak. However, Riak exposes several other features for searching and accessing data: MapReduce, full-text search, and secondary indexing.

MapReduce provides non-primary key based querying that divides work across the Riak distributed database. It is useful for tasks such as filtering by tags, counting words, extracting links, analyzing log files, and aggregation tasks. Riak provides both Javascript and Erlang MapReduce support. Jobs written in Erlang are generally more performant. You can find more details about Riak MapReduce here.

Riak also provides Riak Search, a full-text search engine that indexes documents on write and provides an easy, robust query language and SOLR-like API. Riak Search is ideal for indexing content like posts, user bios, articles, and other documents, as well as indexing JSON data. For more information, see the documentation on Riak Search.

Secondary indexing allows you to tag objects in Riak with one or more queryable values. These “tags” can then be queried by exact or range value for integers and strings. Secondary indexing is great for simple tagging and searching Riak objects for additional attributes. Check out more details here.

How does Riak differ from other databases?

We often get asked how Riak is different from other databases and other technologies. While an in-depth analysis is outside the scope of this post, the below should point you in the right direction.

Riak is often used by applications and companies with a primary background in relational databases, such as MySQL. Most people who move from a relational database to Riak cite a few reasons. For one, Riak’s masterless, fault-tolerant, read/write available design make it a better fit for data that must be highly available and resilient to failure scenarios. Second, Riak’s operational profile and use of consistent hashing means data is automatically redistributed as you add machines, avoiding hot spots in the database and manual resharding efforts. Riak is also chosen over relational databases for the multi-datacenter capabilities provided in Riak Enterprise. A more detailed look at the difference between Riak and traditional databases and how to make the switch can be found in this whitepaper, From Relational to Riak.

A more detailed look at the technical differences between Riak and other NoSQL databases can be found in the comparisons section of our documentation, which covers databases such as MongoDB, Couchbase, Neo4j, Cassandra, and others.

Ready to get started? You can download Riak here. For more in-depth information about Riak, we also offer Riak Workshops in New York and San Francisco. Learn more here.


Riak Workshops: Coming Soon to New York and San Francisco

April 15, 2013

On May 15-16, Basho will be hosting Riak training in New York. This two-day training will provide an in-depth look at Riak. It is designed for engineers, developers, and operations staff to learn how to run, operate, and build apps with Riak.

During this training, participants will learn how to:

  • Set up a small Riak cluster
  • Query the cluster using basic Key/Value, Links, 2i, and Map/Reduce
  • Understand deployment and performance considerations
  • Evaluate application Access Patterns
  • Consider data modeling implications in a distributed system

This training will also go over a number of topics, including:

  • Introduction to Riak
  • Basic Querying
  • Riak Under-the-Hood
  • Deployment Considerations
  • Performance Tuning
  • Monitoring/Troubleshooting
  • Application Development
  • Data Modeling
  • Distributed Systems Engineering

If you’re interested in attending, tickets can be purchased here. A 50% discount is available for those who are also attending RICON East. If you have any questions about the training, you can reach out to casey@basho.com.

For those of you on the West Coast, we will also be hosting a Riak Workshop in San Francisco from May 20-21. Tickets for the San Francisco training can be purchased here.


Getting Started with Riak in the Cloud

April 10, 2013

Earlier this year, we announced that hosted Riak is now available on the Engine Yard platform. Ines Sombra, Lead Data Engineer at Engine Yard, has put together a talk to help you get started using Riak in cloud environments. This talk introduces Riak’s overall architecture, some common use cases, and goes over some questions to consider when choosing a database. It also discusses what you need to know to run Riak in the cloud and how it differs from traditional hardware installation.

You can view the full talk below:

For more information about Riak on Engine Yard, check out our blog post or get started now with 500 hours free on their platform.


Syslog and Riak 1.3

April 8, 2013

Logs are an important part of understanding the health of any system. They provide historic records of events with information that may not be seen when running ad-hoc commands like riak-admin or reviewing the data gathered and trended from riak-admin status or http://$riakNode/stats.

When used correctly, logs can forewarn of an issue as well as provide forensic evidence to use after the event.

By default, Riak’s logging is split sensibly into different files for different types of log data: errors, crash and console. Take a look in /var/log/riak or /var/log/riak-cs to explore further.

Since you may have five to 50 nodes in your Riak cluster (we recommend that you have at least five nodes), reviewing those logs by hand would be a timely and repetitive task. This makes it unlikely that you’d catch any events as they’re happening.

Gathering logs centrally for analysis is the best way to resolve this and make this easier. Riak has Syslog support through its logging framework – Lager.


Lager is another great open source project from Basho. We recommend checking out the project page for full details, but a summary of its features are:

  • Custom formatting
  • Runtime changes
  • Internal log rotation (yes, sysadmin, we already have you covered)
  • AMQP support
  • SMTP support
  • Syslog Support
  • Loggly Support
  • Tracing

Configuring Lager to use Syslog is as easy as opening your /etc/riak/app.config then navigating down to Lager’s configuration section. Here, you can replace the existing handlers section with

{handlers, [
{lager_syslog_backend, ["riak", local1, info]},
] },

Then, in your local syslog.conf you may add the following: (Please remember Syslog likes tabs between sections, not spaces. If you copy and paste, you will need to edit the line accordingly.)

local1.info             /var/log/riak_info.log

Both Riak and Syslog will require restarting to apply the changes.

You can add multiple handlers for the various levels you wish to configure. You can also use a hybrid approach, leaving Lager managing some levels while Syslog manages others. Check out the Lager Syslog project page for more information.

The next step in your logging solution would be to redirect the logs to your central Syslog server. Doing so is outside the scope of this post, but there are many great guides available that describe the process.

Analysis Tools

There are some excellent tools out there for log file anaylsis. These include: Splunk and Logstash, which can both be run locally, and Loggly and Papertrail, which are Software as a Service.

What am I seeing?

When using one of these tools, we recommend that you set up alerts for common, frequent, and important messages. Check our online documentation for descriptions of common messages to get started.

Further Help

If you need further assistance, please don’t hesitate to contact the team and community:

IRC – irc.freenode.net #riak

Email – http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Or, if you’re an enterprise customer, contact us through your support account.

Team Client Services

Boundary Uses Riak to Handle Massive Data Scale

April 4, 2013

Boundary provides innovative application monitoring for new IT architectures: one-second app visualization, cloud-compatible, and only a few minutes from setup to results. They use Riak to store all of their historic data.

Due to the nature of Boundary’s service, they take in a massive amount of data at a high resolution and volume. After experimenting with a number of different options, Riak was selected for historic data due to its scalability, simplicity, high availability, and transparency. According to Boundary Founder, Cliff Moon, “Riak’s transparency and simplicity gives us peace of mind while it’s in production. We know what it’s doing and never have to worry about it.”

Altogether, they handle nearly 3TB of writes per day, store 200-300 million keys, and have three trillion separate observations stored in their cluster at any given point.

To learn more about how Boundary uses Riak, check out the complete case study.