Tag Archives: Riak

Webinar Recap – Schema Design for Riak

July 7, 2010

Thank you to all who attended the webinar yesterday. The turnout was great, and the questions at the end were also very thoughtful. Since I didn’t get to answer very many, I’ve reviewed the questions below, in no particular order. If you want to review the slides from yesterday’s presentation, they’re on Slideshare.

Q: You say listing keys is expensive. How are Map phases affected? Does the number of keys in a bucket have an effect on the expense of the operation? (paraphrased)

Listing keys (for a single bucket, there is no analog for the entire system) requires traversing the entire keyspace, even examining keys that don’t belong to the requested bucket. If your Map/Reduce query uses a whole bucket as its inputs, it will be nearly as expensive as listing keys back to the client; however, Map phases are executed in parallel on the nodes where the data lives, so you get the full benefits of parallelism and data-locality when it executes. The expense of listing keys is taken before any Map phase begins.

It bears reiterating that the expense of listing keys is proportional to the total number of keys stored (regardless of bucket). If your bucket has only 10 keys and you know what they are, it will probably be more efficient to list them as the inputs to your Map/Reduce query than to use the whole bucket as an input.

Q: How do you recommend modeling relationships that require a large number of associations (thousands or millions)?

This is difficult to do, and I won’t say there’s an easy or best answer. One idea that came up in the IRC
room after the webinar was building a B-tree-like data-structure that could be grown to fit the number of associations. This solves the one-to-many relationship, but will require extra handling and care on the part of your application. In some cases, where you only need to know membership in the relationship, a bloom filter might be appropriate. If you must model lots of highly-connected data, consider throwing a graph database in the mix. Riak is not going to fit all use-cases, some models will be awkward.

Q: My company provides a Java web application and analytics solution that uses JDO to persist to and query from a relational database. Where would I start in integrating with Riak?

Since I haven’t done Java in a serious way for a long time, I can’t speak to the specifics of JDO, or how you might work on migrating away from it. However, I have found that most ORMs hide things from the
developer that he/she should really be aware of — how the mapping is performed, what queries are executed, etc. You’ll likely have to look into the guts of how JDO persists and retrieves objects from the database, then step back and reevaluate what your top queries are and how Riak can help improve or simplify those operations. This is all in the theme of the webinar: Know your data!

Q: Is the source code for the example application and schema design available? (paraphrased)

No, there isn’t any sample code yet. You can play with the existing application (Lowdown) at lowdownapp.com. The other authors and I are seeking a few people to take over its development, and the initial group we contacted have indicated it will be open-sourced.

Q: Is there an way to get notified on changes in a bucket?

That’s not built-in to Riak. However, you could write a post-commit hook in Erlang that pushes a notification to RabbitMQ, for example, then have the interested parties consume messages from that queue.

Q: What mechanism does Riak have to deal with the unique user issue?

Riak has neither write locks nor transactions. There is no way to absolutely guarantee uniqueness without introducing an intermediary that acts as a single-arbiter (and point-of-failure). However, in cases when you aren’t experiencing high write-concurrency on the data in question there are a few things you can do to simulate the uniqueness constraint:

  • Check for existence of the key before writing. In HTTP, this is as simple as a HEAD request. If the response is 404 Not Found, the object probably doesn’t exist.
  • Use a conditional PUT (in HTTP) when creating the object. The If-None-Match: * header should prevent you from blindly overwriting an existing key.

Neither of these solutions are bullet-proof because all operations happen in Riak asynchronously. Remember that it’s eventually consistent, meaning that not all parts of the system may agree at all times, but they will converge on a single state over time. There will be corner-cases where a key doesn’t exist when you check for it, the write via the conditional request succeeds, and you still end up creating an object in conflict. Caveat emptor.

Q: Are the intermediate results of Link and Map phases cached?

Yes, the results of both map and link phases are cached in a pretty naive LRU. The development team has plans to improve its behavior in future versions of Riak.

Q: Could you comment on commit hooks and what place they have, if any, in riak schema design? Would it make sense to use hooks to build an index e.g. keys in a bucket?

Yes, commit hooks are very useful in schema design. For example, you could use a pre-commit hook to validate the format of data before it’s stored. You could use post-commit hooks to send the data to external services (see above) or, as you suggest, build an index in another bucket. Building a secondary index reliably is complicated though, and it’s something I want to work on over the next few months.

Q: So if you have allow_mult=false are there cases where riak will return a conflict 409? Is the default that last write wins?

Riak never returns a 409 Conflict status from the HTTP interface on writes. If you supply a conditional header (If-Match, for example) you might get a 412 Precondition Failed response if the ETag of the object to be modified doesn’t match the header. In general, it is Riak’s policy to accept writes regardless of the internal state of the object.

The “last write wins” behavior comes in two flavors: “clobbering” writes, and softer “show me the latest one” reads. The latter is the default behavior, in which siblings might occur internally (and the vector clock grown) but not exposed to the client; instead it returns the sibling with the latest timestamp at read/GET time and “throws away” new writes that are based on older (ancestor) vclocks. The former actually ignores vector clocks for the specified bucket, providing no guarantees of causal ordering of writes. To turn this behavior on, set the last_write_wins bucket property to true. Except in the most extreme cases where you don’t mind clobbering things that were written since the last time you read, we recommend using the default behavior. If you set allow_mult=true, conflicting writes (objects with divergent vector clocks, not traceable descendents) will be exposed to the client with a 300 Multiple response.

Again, thanks for attending! Look for our next webinar in about two weeks.


Free Webinar – Schema Design for Riak – July 8th at 2PM Eastern

June 30, 2010

Moving applications to Riak involves a number of changes from the status quo of RDBMS systems, one of which is taking greater control over your schema design. You’ll have questions like: How do you structure data when you don’t have tables and foreign keys? When should you denormalize, add links, or create map-reduce queries? Where will Riak be a natural fit and where will it be challenging?

We invite you to join us for a free webinar on Thursday, July 8 at 2:00PM Eastern Time to talk about Schema Design for Riak. We’ll discuss:

  • Freeing yourself of the architectural constraints of the “relational” mindset
  • Gaining a fuller understanding of your existing schema and its queries
  • Strategies and patterns for structuring your data in Riak
  • Tradeoffs of various solutions

We’ll address the above topics and more as we design a new Riak-powered schema for a web application currently powered by MySQL. The presentation will last 30 to 45 minutes, with time for questions at the end.

Fill in the form below if you want to get started building applications on top of Riak!

Sorry, registration is closed.

The Basho Team

SWAG Alert — Riak at Velocity

June 6, 2010

Velocity, the “Web Performance and Operations Conference” put on by O’Reilly, kicks off tomorrow and we here at Basho are excited. Why? Because Benjamin Black, acknowledged distributed systems expert, will be giving a 90 minute tutorial on Riak. The official name of the session is called “Riak: From Design to Deploy.” If you haven’t already read it, you can get the full description of the session here.

I just got a sneak peek at what Benjamin has planned and all I can say is that if you are within 100 miles of Santa Clara, CA tomorrow and not in this session, you will regret it.

And, what better to go with a hands on Riak tutorial than some good old fashioned SWAG? Here is a little offer to anyone attending tomorrow: post a write up of Benjamin’s session and I’ll send you a Riak SWAG pack. It doesn’t have to be a novel, just a few thoughts will do. Post them somewhere online for all the world to see and learn from, and I’ll take care of the rest.

Enjoy Velocity. We are looking forward to your reviews!

Mark Phillips
Community Manager

Riak Fast Track Revisited

May 27, 2010

You may remember a few weeks back we posted a blog about a new feature on the Riak Wiki called The Riak Fast Track. To refresh your memory, “The Fast Track is a 30-45 minute interactive tutorial that introduces the basics of using and understanding Riak, from building a three node cluster up through MapReduce.”

This post is intended to offer some insight into what we learned from the launch and what we are aiming to do moving forward to build out the Fast Track and other similar resources.

The Numbers

The Fast Track and accompanying blog post were published on Tuesday, May 5th. After that there was a full week to send in thoughts, comments, and reviews. In that time period:

  • I received 24 responses (my hope was for >15)
  • Of those 24, 10 had never touched Riak before
  • Of those 24, 13 said they were already planning on using Riak in production or after going through the
    Fast Track were now intending to use Riak in production in some capacity

The Reviews

Most of the reviews seemed to follow a loose template: “Hey. Thanks for this! It’s a great tool and I learned a lot. That said, here is where I think you can improve…”

Putting aside the small flaws (grammar, spelling, content flow, etc.), there emerged numerous recurring topics:

  • Siblings, Vector Clocks, Conflict Resolution, Concurrent Updates…More details please. How do they work in Riak and what implications do they have?
  • Source can be a pain. Can we get a tutorial that uses the binaries?
  • Curl is great, but can we get an Erlang/Protocol Buffers/language specific tutorial?
  • I’ve heard about Links in Riak but there is nothing in the Fast Track about it. What gives!?
  • Pictures, Graphics and Diagrams would be awesome. There is all this talk of Rings, Clusters, Nodes, Vnodes, Partitions, Vector Clocks, Consistent Hashing, etc. Some basic diagrams would go along way in helping me grasp the Riak architecture.
  • Short, concise screencasts are awesome. More, please!
  • The Basic API page is great but it seems a bit…crowded. I know they are all necessary but do we really need all this info about query parameters, headers and the like in the tutorial?

Another observation about the nature of the reviews: they were very long and very detailed. It would appear that a lot of you spent considerable time crafting thoughtful responses and, while I was expecting this to some extent, I was still impressed and pleasantly surprised.

This led me to draw two conclusions:

  1. People were excited by the idea of bettering the Fast Track for future Riak users to come
  2. Swag is a powerful motivator

Now, I’m going to be a naïve Community Manager and let myself believe that the Riak Developer Community maintains a high level of programmer altruism. The swag was just an afterthought, right?

So What Did We Change?

We have been doing the majority of the editing and enhancing on the fly. This process is still ongoing and I don’t doubt that some of you will notice elements still present that you thought needed changing. We’ll get there. I promise.

Here is a partial list of what was revised:

  • The majority of changes were small and incremental, fixing a phrase here, tweaking a sentence there. Many small fixes and tweaks go a long way!
  • The most-noticeable alterations are on the MapReduce page, where we worked a lot to make it flow better and more interactive. This continues to be improved.
  • The Basic API Operations page got some love in the form of simplification. After reading your comments, we went back and realized that we were probably throwing too much information at you too fast.
  • There are now several graphics relating to the Riak Ring and Consistent Hashing. There will be more.

And, as I said, this is still ongoing.

Thank You!

I’ve added a Thank You page to the end of the Fast Track to serve as a permanent shout-out to those who help revise and refine the Fast Track. (I hope to see this list grow, too.) Future newcomers to Riak will surely benefit from your time, effort, and input.

What is Next?

Since its release, the Fast Track tutorial has become the second most-visited page on the Riak Wiki, second only to the wiki.basho.com itself. This tells us here at Basho that there is a need for more tools and tutorials like this. So our intention is to expand this as far as time permits.

In the short term, we plan to add a link-walking page. This was scheduled for the original iteration of the Fast Track but was scrapped because we didn’t have time to assemble all the components. The MapReduce section is going to get more interactive, too.

Another addition will be content and graphics that demonstrate Riak’s fault-tolerance and ability to withstand node outages.

We also want to get more specific with languages. Right now, it uses curl over HTTP. This is great but language-specific makes tremendous sense, and the only preventing us from doing this is time. The ultimate vision is to expand transform the Fast Track into a sort of “choose your own adventure” module, such that if a Ruby dev who prefers Debian shows up at wiki.basho.com without having ever heard of Riak, they can click a few links and arrive at a tutorial that shows them how to spin up three nodes of Riak on Debian and query it through Ripple. Erlang, Ruby, Javascript and Java are at the top of the list.

But, we have a long way to go before we get there, so stay tuned for continuous enhancements and improvements. And if you’re at all at interested in helping develop and expand the Fast Track (say, perhaps, outlining an up-and-running tutorial for for Riak+JavaScript) don’t hesitate to shoot an email to mark@basho.com.


Community Manager

Free Webinar – Load-Testing Riak – June 3rd @ 2PM Eastern

May 26, 2010

We frequently receive questions about how Riak will behave in specific production scenarios, and how to tune it for those workloads. The answer is, generally, that it depends — the only way to know for sure is to measure!

We invite you to join us for a free webinar on Thursday, June 3 at 2:00PM Eastern Time to discuss how to measure Riak performance and develop your own action testing plan. We’ll discuss:

  • Understanding what to test, and how to test it
  • Classifying your application and its load profile
  • Configuring and using our load generation and measurement tool
  • Interpreting the results of your test and taking next steps

As part of the webinar, you will get access to and a demonstration of basho_bench, our benchmarking and load-testing tool. The presentation will last 30 to 45 minutes.

Registration will be limited to 20 parties, on a first-come, first-serve basis. Fill in the form below if you’re serious about getting a scalable, predictable, cost-effective storage solution in Riak!

UPDATE: Due to overwhelming demand for this Webinar, we are expanding the number of seats available. So don’t shy away from registering if you think it’s full. We will have plenty of room for you!

The Basho Team

Webinar Registration Form

Registration has closed. We’ll hold another one soon!

Riak Search

May 21, 2010

This post is going to start by explaining how in-the-trenches experience with key/value stores, like Riak, led to the creation of Riak Search. Then it will tell you why you care, what you’ll get out of Riak Search, and why it’s worth waiting for.

A bit of history

Few people know that Basho used to develop applications for deployment on Salesforce.com. We had big goals, and were thinking big to fill them, and part of that was choosing a data storage system that would give us what we needed not only to succeed and grow, but to survive — a confluence of pragmatism and ideal that embodied a bulletproof operations story, a path upward — resilience, reliability, and scalability, through the use of proven science.

So, that’s what we did: we developed and used what has grown to be, and what you know today, as Riak.

Idealism can’t get you everywhere, though. While we answered hard questions with link-walking and map/reduce, there was still the desire in the back of all of our heads: sometimes you just want to ask, “What emails were sent on May 21 that included the word ‘strategy’?” without having to figure out how to walk links from an organizational chart to mailboxes to mails, and then filter over the data there. It was a pragmatic desire: we just wanted a quick answer in order to decide whether or not to spend more time chasing a path. “Less yak-shaving, please.”

The Operations Story

Then we stopped making Salesforce.com apps, and started selling Riak. We quickly found the same set of desires. Operationally, Riak is a huge win. Pragmatically, something that does indexing and search in a similar operational manner is even bigger. Thus, Riak Search was born.

The operational story is, in a nutshell, this: when you add another node to your cluster, you add capacity and compute power. That’s it, you just add another box and “it just works.” Purposefully or not, eventually a node leaves the cluster, hardware fails, whatever: Riak deals with it. If the node comes back, it’s absorbed like it never left.

We insisted on these qualities for Riak, and have continued that insistence in Riak Search. We did it with all the familiar bits: consistent hashing, hinted handoff, replication, etc.

Why Riak Search?

Now, we’ll be the first to tell you that with Riak you can get pretty far using link-walking and map/reduce, with the understanding that you know what you are going to want ahead of time, and/or are willing to wait for it.

Riak Search answers questions that pop into your head; “find me all the blue dresses that are between $20 and $30 dollars,” “find me the document Bob referred to last week at the TPS procedures meeting,” “how can I delete all these emails from my aunt that have those stupid attachments?” “find me that comic strip with Bob,” etc.

It’s about making members of the sea of data in your key-value store findable. At a higher level, it’s about agility. The ability to answer questions you have about your business and your customers without having to consult a developer or dig through reference manuals and without your application developers having to reinvent the wheel with a very real possibility of doing it just right enough to assure you nothing will go wrong. It’s about a common indexing language.

Okay, now you know — thanks for bearing with us — let’s get to the technical bits.

Riak Search …

The system we have built …

  1. is an easily-scalable, fault-tolerant search and indexing system, adhering to the operational story you just read
  2. supports full-text indexing and search
  3. allows querying via the Lucene query syntax
  4. has Solr-compatible /select and /update web-services
  5. supports date and numeric indexing
  6. supports faceting
  7. automatically distributes indexes
  8. has an intermediate query language and integrated query planner
  9. supports scoring
  10. has integrated tokenizing, filtering and analysis (yes, you can
    use StandardAnalyzer!)

… and much more. Sounds pretty great, right?

If you want to know more about the internals and technical nitty gritty, check out the Riak Search presentation one of our own, Riak Search engineer John Muellerleile, gave at the San Francisco Erlang Factory this year.

So, why don’t you have it yet? The easy part.

There are still some stubs and hard-coded things in what we have. For instance, the full-text analyzer in use is just whitespace, case-normalization, and stop-word filtering. We intend to fully support the ability to specify other Lucene analyzers, including custom modules, but the code isn’t there yet.

There is also very little documentation. Without a little bit of handholding, even the brightest and most ambitious user could be forgiven for staring blankly, lost for even the first question to ask. We’re spreading the knowledge among our own team right now; that process will generate the artifacts needed for the next round of users to step in.

There are also many fiddly, finicky bits. These are largely relics of early iterations. Rather than having the interwebs be flooded with, “How do you stop this thing?” (as it was with Riak), we’re going to make things friendlier.

So, why don’t you have it yet? The not-so-easy part.

You’ve probably asked yourself, “What of integration of Riak and Riak Search?” We have many notes from discussions about how it could or should be done, as well as code showing how it can be done. But, we’re not completely satisfied with any of our implementations so far.

There are certainly no shortage of designs and ideas on how this could or should work, so we’re going to make a final pass at refining all of our ideas, given our current Riak Search system to play with, so that we can provide a solid, extensible system, instead of one that with many rough edges that would almost certainly be replaced immediately.

Furthering this sentiment is that we think that our existing map/reduce framework and the functionality and features provided by Riak Search are a true power combo when used together intelligently, than simply as alternatives, or at worse, at odds. As a result, we’re defining exactly how Riak Search indexing and querying should be threaded into Riak map/reduce processing to bring you a combination that is undoubtedly more than the sum of its parts.

We could tease you with specifics, like generating the set of bucket/key inputs to a map phase by performing a Riak Search query, or parameterizing Search phases with map results; though, for now, amidst protest both internally — we’re chomping at the bit to get this out into the world and into your hands  – and externally, as our favorite people continually request this exact set of technology and features, we’re going to implement the few extra details from our refined notes before forcing it on you all.

Hold on just a little longer. :)

-the Riak Search Team

Webmachine in the Data Center

May 19, 2010

While Riak is Basho’s most-heavily developed and widely distributed piece of open source software, we also hack on a whole host of other projects that are components of Riak but also have myriad standalone uses. Webmachine is one of those projects.

We recently heard from Caleb Tennis, a member of The Data Cave team who was tasked with building out and simplifying operations in their 80,000 square foot data center. Caleb filled us in on how he and his team are using Webmachine in their day to day operations to iron out the complexities that come with running such a massive facility. After some urging on our part, he was gracious enough to put together this illustrative blog post.



Community Manager

A Data Center from the Ground Up

Building a new data center from the ground up is a daunting task. While most of us are familiar with the intricacies of the end product (servers, networking gear, and cabling), there’s a whole backside supporting infrastructure that also must be carefully thought out, planned, and maintained. Needless to say, the facilities side of a data center can be extremely complex.

Having built and maintained complex facilities before, I already had both experience and a library of software in my tool belt that I had written to help manage the infrastructure of these facilities. However, I recognized that if I was to use the legacy software, some of which was over 10 years old, it would require considerable work to fit it to my current needs. And, during that period, many other software projects and methodologies had matured to a state that it made sense to at least consider a completely different approach.

The main crux of such a project is that it involves communications with many different pieces of equipment throughout the facility, each of which has its own protocols and specifications for communication. Thus, the overall goal of this project is to abstract the communications behind the scenes and present a consistent and clear interface to the user so that the entire process is easier.

Take for example the act of turning on a pump. There are a number of pumps located throughout the facility that need to be turned on and off dynamically. To the end user, a simple “on/off” style control is what they are looking for. However, the actual process of turning that pump on is more complicated. The manufacturer for the pump controller has a specific way of receiving commands. Sometimes this is a proprietary serial protocol, but other times this uses open standard protocols like Modbus, Fieldnet, or Devicenet.

In the past, we had achieved this goal using a combination of open source libraries, commercial software, and in-house software. (Think along the lines of something like Facebook’s Thrift, where you define the interface and let the backend implementation be handled behind the scenes;in our case, the majority of the backend was written in C++.)

“This is what led us to examine Erlang…”

But as we were looking at the re-implementation of these ideas for our data center, we took a moment to re-examine them. The implementation we had, for the most part, was stateless, meaning that as systems would GET and SET information throughout the facility, they did so without prior knowledge and without attempting to cache the state of any of the infrastructure. This is a good thing, conceptually, but is also difficult in that congestion on the communication networks can occur if too many things need access to the same data frequently. It also suffered from the same flaws as many other projects: it was big and monolithic; changes to the API were not always easy; and, most of all, upgrading the code meant stops and restarts, so upgrading was done infrequently. As we thought about the same type of implementation in our data center, it became clear that stops and restarts in general were not acceptable at all.

This is what led us to examine Erlang, with its promise of hot code upgrades, distributed goodness, and fault -tolerance. In addition, I had been wanting to learn Erlang for a while now but never really had an excuse to sit down and focus on it. This was my excuse.

While thinking about how this type of system would be implemented in Erlang, I began by writing a Modbus driver for Erlang, as a large portion of the equipment we interact with uses Modbus as part of its communications protocol. I published the fruits of these labors to GitHub (http://github.com/ctennis/erlang-modbus), in the hopes that it might inspire others to follow the same path. The library itself is a little rough (it was my first Erlang project) but it served as the catalyst for thinking about not only how to design this system in Erlang, but also how to write Erlang code in general.

Finding Webmachine

While working on this library, I kept thinking about the overall stateless design, and thought that perhaps a RESTful interface may be appropriate. Using REST (and HTTP) as the way to interface with the backend would simplify the frontend design greatly, as there are myriad tools already available for client side REST handling. This would eliminate the need to write a complicated API and have a complicated client interface for it. This is also when I found Webmachine.

Of course there are a number of different ways this implementation could have been achieved, Erlang or not. But the initial appeal of Webmachine was that it used all of the baked in aspects of HTTP, like the error and status codes, and made it easy to use URLs to disseminate data in an application. It is also very lightweight, and the dispatching is easy to configure.

Like all code, the end result was the product of several iterations and design changes, and may still be refactored or rewritten as we learn more about how we use the code and how it fits into the overall infrastructure picture.

Webmachine in Action

Let’s look at how we ultimately ended up using Webmachine to communicate with devices in our data center…

For the devices in the facility that communicate via modbus, we created a modbus_register_resource in Webmachine that handles that interfacing. For our chilled water pumps (First Floor, West, or “1w”), the URL dispatching looks like this:

{["cw_pump","1w",parameter],modbus_register_resource,[cw_pump, {cw_pump_1w, tcp, "", 502, 1}]}.

This correlates to the url: http://address:8000/cw_pump/1w/PARAMETE

So we can formulate URIs something like this:

http://address:8000/cw_pump/1w/motor_speed or http://address:8000/cw_pump/1w/is_active

And by virtue of the fact that our content type is text:

content_types_provided(RD,Ctx) ->
{[{"text/plain", to_text}], RD, Ctx}.

We use HTTP GETs to retrieve the desired result, as text. The process is diagrammed below:

This is what is looks like from the command line:

user@host:~# curl http://localhost:8000/cw_pump/1w/motor_speed


It even goes further. If the underlying piece of equipment is not available (maybe it’s powered off), we use Webmachine to send back HTTP error codes to the requesting client. This whole process is much easier than writing it in C++, compiling, distributing, and doing all of the requisite exception handling, especially across the network. Essentially, what had been developed and refined over the past 10 years as our in-house communications system was basically redone from scratch with Erlang and Webmachine in a matter of weeks.

For updating or changing values, we use HTTP POSTs to change values that are writable. For example, we can change the motor speed like this:

user@host:~# curl -X POST http://localhost:8000/cw_pump/1w/motor_speed?value=50

user@host:~# curl http://localhost:8000/cw_pump/1w/motor_speed


But we don’t stop here. While using Webmachine to communicate directly with devices is nice, there also needs to exist an infrastructure that is more user friendly and accessible for viewing and changing these values. In the past, we did this with client software, written in Windows, that communicated with the backend processes and presented a pretty overview of what was happening. Aside from the issue of having to write the software in Windows, we also had to maintain multiple copies of it actively running as a failover in case one of them went down. Additionally, we had to support people running the software remotely, from home for example, via a VPN, but still had to be able to communicate with all of the backend systems. We felt a different approach was needed.

What was this new approach? Webmachine is its own integrated webserver, so this gave us just about everything we needed to host the client side of the software within the same infrastructure. By integrating jQuery and jQTouch into some static webpages, we built an entire web based control system directly within Webmachine, making it completely controllable via mobile phone. Here is a screenshot:

What’s Next?

Our program is still a work in progress, as we are still learning all about the various ways the infrastructure works as well as how we can best interact with it from both Webmachine and the user perspective. We are very happy with the progress made thus far, and feel quite confident about the capabilities that we will be able to achieve with Webmachine as the front end to our very complex infrastructure.


Caleb Tennis is President of The Data Cave, a full service, Tier IV compliant data center located in Columbus, Indiana.

Planning for Eventual Consistency

May 14, 2010

You may remember that last week, we recorded a podcast with Benjamin Black all about the immense variety of databases in the NoSQL space and what your criteria should be when choosing one.

If you listened carefully, you may also remember that Benjamin and Justin Sheehy started to discuss eventual consistency. We decided to roll that into its own podcast as we thought it was a topic worthy of its own episode.

Think there are only a certain subset of databases that are “eventually consistent”? Think again. Regardless of the database you choose, eventual consistency is something you should embrace and plan for, not fear.

Listen, Learn, and Enjoy – –


If you are having problems getting the podcast to play, click here to play in new window or right click to download the podcast.

NoSQL and Endless Variety

A Conversation about Variety and Making Choices

Benjamin Black stopped by the Basho offices recently and we had the chance to sit him down and discuss the collection of things that is “NoSQL.”

In this, the fifth installment of the Basho Riak Podcast, Benjamin and Basho’s CTO Justin Sheehy discuss the factors that they think should play the largest part in your evaluation of any database, NoSQL or otherwise. (Hint: it’s not name recognition.)

Highlights include why and when you might be best served improving your relational database architecture and when it might be better to use a NoSQL system like Cassandra or Riak to solve part of your problem, as well as why you probably don’t want to figure out which one of the NoSQL systems solves all of your problems.



If you are having problems getting the podcast to play, click here to play in new window or right click to download the podcast.

Introducing the Riak Fast Track

May 4, 2010

Our Challenge

There is nothing easy about making software simple to learn and understand. Every potential user has different nuances to their learning styles, and this makes for a hard road to simple usage. This is especially true with Riak.

Internally at Basho, we are constantly addressing questions like, “How do we make a ‘distributed, Dynamo-inspired key/value store’ inviting and less daunting to first time users?” and “How do we lower the barrier to adoption and usage?” Though resources like the Riak Mailing List, the Riak Wiki, and Riak IRC channel are great, we kept asking ourselves, “What can we do to make it dead simple for those new to and interested in Riak to learn about it and how it works?”

Our answer (in part) is the Riak Fast Track.

What is the Riak Fast Track?

The Fast Track is an interactive module on the Riak Wiki that, through a combination of concise content and brief screencasts, will bring you up to speed on a) what Riak is, b) what its key features and benefits are, and c) how to use it.

As I stated above, the Fast Track is aimed at developers who may be new to Riak or those who may have heard about it in passing but haven’t spent too much time fiddling with it.

Is it exhaustive? No. Will you be an Riak expert after an hour? No. But, at the end of it, you should be able to tell your friends that you performed a JavaScript MapReduce query on historical stock data distributed over a three node Riak cluster on you local machine. If that’s not cool then I don’t know what is!

Your Challenge

We put a lot of time into making this, but there are undoubtedly some kinks that need to be worked out. And, regardless of how long we try to tweak and refine it, there will always be some small aspects and details that we aren’t going to get right. It is for that reason that we are appealing to you, the rapidly-growing Riak Community, to help us.

So, here is the challenge: Take 45 minutes and go through the Riak Fast Track. Then, when you’re done, take five minutes to write us an email and tell us what you thought about it. That’s it.

We are looking for answers to questions like:

  • Was it effective?
  • Did you learn anything?
  • What did we get right?
  • What did we get wrong?
  • What should we add/remove?

And, to sweeten the pot, we are going to send a “Riak Swag Pack” (contents of which are top secret) to everyone who sends us their review and thoughts on the Fast Track by the close of business on Tuesday (5/11) of next week. It doesn’t have to be anything extensive (though we love details). A simple, “I liked x, y, and z, but you could have done this better” would suffice. You can send your emails to mark@basho.com. I am looking forward to hearing from you!

So, without further ado, go forth and test out the Riak Fast Track.

We hope you’ll find it useful and we’re looking forward to your thoughts on how to make it better.


Mark Phillips