Tag Archives: Riak

Riak at PyConZA

**October 03, 2012**

[PyConZA](http://za.pycon.org/) starts this Thursday in Cape Town, South Africa, and Basho is proud to be [a sponsor](http://za.pycon.org/sponsors_basho.html). As our blurb on the PyConZA site states:

>
Basho is proud to sponsor PyConZA because we believe in the power and future of the South African tech community. We look forward to the valuable, lasting technologies that will be coming out of Cape Town and the surrounding areas in the coming years.
>

You may remember that back in August we [put out the call](http://basho.com/blog/technical/2012/08/23/Riak-ambassadors-needed-for-PyCon-ZA/) for Riak Community Ambassadors to help us out with PyconZA. As hard as it is to miss out on a chance to go to The Mother City, it wasn’t feasible for us to make it with [RICON](http://ricon2012.com) happening next week. I’m happy to report that a few fans of durability have stepped forward as Ambassadors to make sure Riak is fully-represented. If you’re lucky enough to be going to PyconZA (it’s sold out), be on the lookout for the following two characters:

### Joshua Maserow

### Mike Jones

In addition to taking in the talks, Mike and Joshua will be on hand to answer your Riak questions (or tell you where to look for the answers if they don’t have them). There will also be some production Riak users in the crowd, so you won’t have to look too far if you want to get up to speed on why you should be running Riak.

Enjoy PyconZA. And thanks again to Mike and Joshua for volunteering to represent Riak in our absence.

[Mark](http://twitter.com/pharkmillups)

Webinar Recap and Q&A – Schema Design for Riak

December 8, 2010

Thank you to all who attended the webinar yesterday. The turnout was great, and the questions at the end were also very thoughtful. Since I didn’t get to answer very many, I’ve reviewed all of the questions below, in no particular order.

Q: Can you touch on upcoming filtering of keys prior to map reduce? Will it essentially replace the need for one to explicitly name the bucket/key in a M/R job? Does it require a bucket list-keys operation?

Key filters, in the upcoming 0.14 release, will allow you to logically select a population of keys from a bucket before running them through MapReduce. This will be faster than a full-bucket map since it only loads the objects you’re really interested in (the ones that pass the filter). It’s a great way to make use of meaningful keys that have structure to them. So yes, it does require an list-keys operation, but doesn’t replace the need to be explicit about which keys to select; there are still many useful queries that can be done when the keys are known ahead of time.

For more information on key-filters, see Kevin’s presentation on the upcoming MapReduce enhancements.

Q: How can you validate that you’ve reached a good/valid KV model when migrating a relational model?

The best way is to try out some models. The thing about schema design for Riak that turns your process on its head is that you design for optimizing queries, not for optimizing the data model. If your queries are efficient (single-key lookup as much as possible), you’ve probably reached a good model, but also weigh things like payload size, cost of updating, and difficulty manipulating the data in your application. If your design makes it substantially harder to build your application than a relational design, Riak may not be the right fit.

Q: Are there any “gotchas” when thinking of a bucket as we are used to thinking of a table?

Like tables, buckets can be used to group similar data together. However, buckets don’t automatically enforce data structure (columns with specified types, referential integrity) like relational tables do; that part is still up to your application. You can, however, add precommit hooks to buckets to perform any data validation that your application shouldn’t handle.

Q: How would you create a ‘manual index’ in Riak? Doesn’t that need to always find unique keys?

One basic way to structure a manually-created index in Riak is to have a bucket specifically for the index. Keys in this bucket correspond to the exact value you are indexing (for fuzzy or incomplete values,
use Riak Search). The objects stored at those keys have links or lists of keys that refer to the original object(s). Then you can find the original simply by following the link or using MapReduce to extract and find the related keys.

The example I gave in the webinar Q&A was indexing users by email. To create the index, I would use a bucket named users_by_email. If I wanted to lookup my own user object by email, I’d try to fetch the object
at users_by_email/sean@basho.com, then follow the link in it (something like </riak/users/237438-28374384-128>; riaktag="indexed") to find the actual data.

Whether those index values need to be unique is up to your application to design and enforce. For example, the index could be storing links to blog posts that have specific tags, in which case the index need not be unique.

To create the index, you’ll either have to perform multiple writes from your application (one for the data, one for the index), or add a commit hook to create and modify it for you.

Q: Can you compare/contrast buckets w/ Cassandra column families?

Cassandra has a very different data model from Riak, and you’ll want to consult with their experts to get a second opinion, but here’s what I know. Column families are a way to group related columns together that you will always want to retrieve together, and is something that you design up-front (it requires restarting the cluster for changes to take effect). It’s the closest thing to a relational table that Cassandra has.

Although you do use buckets to group similar data items, in contrast, Riak’s buckets:

  1. Don’t understand or enforce any internal structure of the values,
  2. Don’t need to be created or designed ahead of time, but pop into existence when you first use them, and
  3. Don’t require a restart to be used.

Q: How would part sharing be achieved? (this is a reference to the example given in the webinar, Radiant CMS)

Radiant shares content parts only when specified by the template language, and always by inheritance from ancestor pages. So if the layout contained <r:content part="sidebar" inherit="true" />, then if the currently rendering page doesn’t have that content part, it will look up the hierarchy until it finds it. This is one example of why it’s so important to have an efficient way to traverse the site hierarchy, and why I presented so many options.

Q: What is the max number of links an object can have for Link Walking?

There’s no cut-and-dry answer for this. Theoretically, you are limited only by storage space (disk and RAM) and the ability to retrieve the object from the desired interface. In a practical sense this means that the default HTTP interface limits you to around 100,000 links on a single object (based on previous discussions of the limits of HTTP packets and header lengths). Still, this is not going to be reasonable to deal with in your application. In some applications we’ve seen links on the order of hundreds per object negatively impact link-walking performance. If you need to have that many, you’ll be better off exploring other designs.

Again, thanks for attending! Look for our next webinar coming in about month.

Sean, Developer Advocate

Introducing Riak Function Contrib

December 2, 2010

A short while ago I made it known on the Riak Mailing list that the Basho Dev team was working on getting “more resources out the door to help the community better use Riak.” Today we are pleased to announce that Riak Function Contrib, one of these resources, is now live and awaiting your usage and code contributions.

What is Riak Function Contrib?

Riak Function Contrib is a home for MapReduce, Pre- and Post-Commit, and “Other” Functions or pieces of code that developers are using or might have a need for in their applications or testing. Put another way, it’s a Riak-specific function repository and library. Riak developers are using a lot of functions in a ton of different ways. So, we built Function Contrib to promote efficient development of Riak apps and to encourage a deeper level of community interaction through the use of this code.

How do I use it?

There are two primary ways to make use of Riak Function Contrib:

  1. Find a function – If, for instance, you needed a Pre-Commit Hook to validate a JSON document before you store it in Riak, you could use or adapt this to your needs. No need to write it yourself!
  2. Contribute a Function – There are a lot of use cases for Riak. This leads to many different functions and code bits that are written to extend Riak’s functionality. It you have one (or 20) functions that you think might be of use to someone other than you, contribute it. You’ll be helping developers everywhere. Francisco Treacy and Widescript, for example, did their part when they contributed this JavaScript reduce function for Sorting by Fields.

What’s Next?

Riak Function Contrib is far from being a complete library. That’s where you come in. If you have a function, script, or some other piece of code that you think may be beneficial to someone using Riak (or Riak Search for that matter), we want it. Head over to the Riak Function Contrib Repo on GitHub and check out the README for details.

In the meantime, the Basho Dev team will continue polishing up the site and GitHub repo to make it easier to use and contribute to.

We are excited about this. We hope you are, too. As usual, thanks for being a part of Riak.

More to come…

Mark

Community Manager

Free Webinar – Schema Design for Riak – Dec 7 at 2PM Eastern

December 1, 2010

Moving applications to Riak involves a number of changes from the status quo of RDBMS systems, one of which is taking greater control over your schema design. You’ll have questions like: How do you structure data when you don’t have tables and foreign keys? When should you denormalize, add links, or create MapReduce queries? Where will Riak be a natural fit and where will it be challenging?

We invite you to join us for a free webinar on Tuesday, December 7 at 2:00PM Eastern Time to talk about Schema Design for Riak. We’ll discuss:

  • Freeing yourself of the architectural constraints of the “relational” mindset
  • Gaining a fuller understanding of your existing schema and its queries
  • Strategies and patterns for structuring your data in Riak
  • Tradeoffs of various solutions

We’ll address the above topics and more as we design a new Riak-powered schema for a web application currently powered by MySQL. The presentation will last 30 to 45 minutes, with time for questions at the end.

If you missed the previous version of this webinar in July, here’s your chance to see it! We’ll also use a different example this time, so even if you attended last time, you’ll probably learn something new.

Fill in the form below if you want to get started building applications on top of Riak!

Sorry, registration is closed! Video of the presentation will be posted on Vimeo after the webinar has ended.

The Basho Team

Two new Erlang/OTP applications added to Riak products – 'riak_err' and 'cluster_info'

November 17, 2010

The next release of Riak will include two new Erlang/OTP applications: riak_err and cluster_info. The riak_err application will improve Riak’s runtime robustness by strictly limiting the amount of RAM that is used while processing event log messages. The cluster_info application will assist troubleshooting by automatically gathering lots of environment, configuration, and runtime statistics data into a single file.

Wait a second, what are OTP applications?

The Erlang virtual machine provides most of the services that an operating system like Linux or Windows provides: memory management, file system management, TCP/IP services, event management, and the ability to run multiple applications. Most modern operating systems allow you to run a Web browser, word processor, spreadsheet, instant messaging app, and many others. And if your email GUI app crashes, your other applications continue running without interference.

Likewise, the Erlang virtual machine supports running multiple applications. Here’s the list of applications that are running within a single Riak node — we ask the Erlang CLI to list them for us.

text
(riak@127.0.0.1)6> application:which_applications().
[{cluster_info,"Cluster info/postmortem app","0.01"},
{luwak,"luwak","1.0"},
{skerl,"Skein hash function NIF","0.1"},
{riak_kv,"Riak Key/Value Store","0.13.0"},
{riak_core,"Riak Core","0.13.0"},
{bitcask,[],"1.1.4"},
{luke,"Map/Reduce Framework","0.2.2"},
{webmachine,"webmachine","1.7.3"},
{mochiweb,"MochiMedia Web Server","1.7.1"},
{erlang_js,"Interface between BEAM and JS","0.4.1"},
{runtime_tools,"RUNTIME_TOOLS version 1","1.8.3"},
{crypto,"CRYPTO version 1","1.6.4"},
{os_mon,"CPO CXC 138 46","2.2.5"},
{riak_err,"Custom error handler","0.1.0"},
{sasl,"SASL CXC 138 11","2.1.9"},
{stdlib,"ERTS CXC 138 10","1.16.5"},
{kernel,"ERTS CXC 138 10","2.13.5"}]

Yes, that’s 17 different applications running inside a single node. For each item in the list, we’re told the application’s name, a human-readable name, and that application’s version number. Some of the names like ERTS CXC 138 10 are names assigned by Ericsson.

Each application is, in turn, a collection of one or more processes that provide some kind of computation service. Most of these processes are arranged in a “supervisor tree”, which makes the task of managing faults (e.g., if a worker process crashes, what do you do?) extremely easy. Here is the process tree for the kernel application.

And here is the process tree for the riak_kv application.

The riak_err application

See the GitHub README for riak_err for more details.

The Erlang/OTP runtime provides a useful mechanism for managing all of the info, error, and warning events that an application might generate. However, the default handler uses some not-so-smart methods for making human-friendly message strings.

The big problem is that the representation used internally by the virtual machine is a linked list, one list element per character, to store the string. On a 64-bit machine, that’s 16 bytes of RAM per character. Furthermore, if the message contains non-printable data (i.e., not ASCII or Latin-1 characters), the data will be formatted into numeric representation. The string “Bummer” would be formatted just like that, Bummer. But if each character in that string had the constant 140 added to it, the 6-byte string would be formatted as the 23-byte string 206,257,249,249,241,254 instead (an increase of about 4x). And, in rare but annoying cases, there’s some additional expansion on top of all of that.

The default error handler can take a one megabyte error message and use over 1 megabyte * 16 * 4 = 32 megabytes of RAM. Why should error messages be so large? Depending on the nature of a user’s input (e.g. a large block of data from a Web client), the process’s internal state, and other factors, error messages can be much, much bigger than 1MB. And it’s really not helpful to consume half a gigabyte of RAM (or more) just to format one such message. When a system is under very heavy load and tries to format dozens of such messages, the entire virtual machine can run out of RAM and crash.

The riak_err OTP application replaces about 90% of the default Erlang/OTP info/error/warning event handling mechanism. The replacement handler places strict limits on the maximum size of a formatted message. So, if you want to limit the maximum length of an error message to 64 kilobytes, you can. The result is that it’s now much more difficult to get Riak to crash due to error message handling. It makes us happy, and we believe you’ll be happier, too.

Licensing for the riak_err application

The riak_err application was written by Basho Technologies, Inc. and is licensed under the Apache Public License version 2.0. We’re very interested in bug reports and fixes: our mailbox is open 24 hours per day for GitHub comments and pull requests.

The cluster_info application

The cluster_info application is included in the packaging for the Hibari key-value store, which is also written in Erlang. It provides a flexible and easily-extendible way to dump the state of a cluster of Erlang nodes.

Some of the information that the application gathers includes:

  • Date & time
  • Statistics on all Erlang processes on the node
  • Network connection details to all other Erlang nodes
  • Top CPU- and memory-hogging processes
  • Processes with large mailboxes
  • Internal memory allocator statistics
  • ETS table information
  • The names & versions of each code module loaded into the node

The app can also automatically gather all of this data from all nodes and write it into a single file. It’s about as easy as can be to take a snapshot of all nodes in a cluster. It will be a valuable tool for Basho’s support and development teams to diagnose problems in a cluster, as a tool to aid capacity planning, and merely to answer a curious question like, “What’s really going on in there?”

Over time, Basho will be adding more Riak-specific info-gathering functions. If you’ve got feature suggestions, (e.g., dump stats on how many times you woke up last night, click here to send all this data to Basho’s support team via HTTP or SMTP), we’d like to hear them. Or, if you’re writing the next game-changing app in Erlang, just go ahead and hack the code to fit your needs.

Licensing for the cluster_info application

The cluster_info application was written by Gemini Mobile Technologies, Inc. and is licensed under the Apache Public License version 2.0.

Scott

Free Software Can Not Be Taken Away

November 15, 2010

Oracle didn’t (and can’t) take away your open source software.

A few weeks ago Oracle caused a lot of confusion when they changed the makeup of the MySQL product line, including a “MySQL Classic Edition” version that does not cost money and does not include InnoDB. That combination in the product chart made many people wonder if InnoDB itself had ceased to be free in either the “free beer” or “free speech” sense. The people wondering and worrying included a few users of Innostore, the InnoDB-based storage engine that can be used with Riak.

Luckily, open source software doesn’t work that way.

Oracle didn’t really even try to do what some people thought; they just released a confusing product graph which they have since updated. The MySQL that most people think of first is MySQL Community Edition and it was not one of the editions mentioned in the chart that confused people. That version of MySQL, as well as all of the GPL components included in it such as InnoDB, remain free of cost and also available under the GPL.

This confusion eventually led to a public response from Oracle, so you can read it authoritatively if you like.

Even if someone wanted to, they couldn’t “take it back” in the way that some people feared. Existing software that has been legitimately available under an open source license such as GPL or Apache cannot retroactively be made unfree. The copyright owner might choose to not license future improvements as open source, but that which is already released in such a way cannot be undone. Oracle and Innobase aren’t currently putting new effort into Embedded InnoDB, but a new project has spun up to move it forward. If the HailDB project produces improvements of value, then future versions of Innostore may switch to using that engine instead of using the original Embedded InnoDB release.

InnoDB is available under the GPL. Innostore, as a derivative work of Embedded InnoDB, is also available under the GPL. Neither Oracle nor Basho can take that away from you.

Justin

Soap people have to grow up quick: Two Weeks in the Life of a NoSQL Company

November 11, 2010

Things are moving incredibly fast in the NoSQL space. I am used to internet-fast — helping bring on 300 customers in a year at Akamai; going from adult bulletin boards and leased lines to hosting sites for twenty percent of the Fortune 500 at Digex (Verizon Business) in eighteen months. I have never seen a space explode like the NoSQL space.

Two weeks ago, Justin Sheehy stood on stage delivering a rousing and thoughtful presentation to the NoSQL East Conference that was less about Riak and more about a definition of first principles that underpinned Riak: what it REALLY means when you claim such terms as scalability (it doesn’t mean buying a bigger machine for your master DB) and fault-tolerance (it has to apply to writes and reads and is binary; you either always accept writes and serve reads or you don’t). The conference was a bit of a coming out party for Basho, which co-sponsored the event with Rackspace, Georgia Tech, and a host of other companies. We had been working on Riak for 18 months or so in relative quiet and it was nice to finally see what people thought, first hand.

There were equally interesting presentations about Pig and MongoDB and a host of other NoSQL entrants, all of which will make for engrossing viewing when they finally get posted. We were told this wasn’t quite as exciting as the NoSQL conference out West but none of us seemed to mind. Home Depot, Turner Broadcasting, Weather.com, and Comcast had all sent folks down to evaluate the technology for real, live problems and the enthusiasm in the auditorium spilled out into the Atlanta bars. Business cards were exchanged, calls set up, even a little business discussed. Clearly, NoSQL databases were maturing fast.

No sooner had we returned to Cambridge than news of Flybridge’s investment in 10Gen came out. Hooray! Someone was willing to bet a $3.4 million dollars on a company in the space. Chip Hazard, ever affable, wrote a nice blog post explaining the investment. According to him, every developer they talked to had downloaded some NoSQL database to test. Brilliant news. He said Flybridge invested in 10Gen because they liked the space and knew the team from their investment in Doubleclick, from whose loins the management team at 10Gen issued. No more felicitous reason exists for a group of persons to invest $3.4 million than that previous investments in the same team were handsomely rewarded. I would wish Chip and 10Gen the best if I had time.

Because contemporaneous with the news of Flybridge’s investment, and almost as if the world had decided NoSQL’s time had come, we began to field emails and calls from interested parties. Trials, quotes, lengthy discussions about features and uses of Riak — the week was a blur. Everyone was conducting a bakeoff: “I have a 4TB database and customers in three continents. I am evaluating Riak and two other document datastores. Tell me about your OLAP features.”

Heady times and, frankly, of somewhat dubious promise, if you ask me. Potential clients that materialize so quickly always seem to disappear just as fast. Really embracing a new technology requires trials, tests, new features, and time. Time most off all. These “bluebirds” would fly away in no time, if my experience held true.

Except, this time it didn’t happen. Contracts were exchanged. Pen nibs were sharpened. It is as if the entire world decided to not wait for the everyone else to jump on the bandwagon and instead, decided to go NoSQL. Even using this last week as the sole example, I think the reason is plain — people have real pain and suddenly the word is out that they no longer have to suffer.

Devs are constrained by what they can build, rich features notwithstanding. Ask the company that had to choose between Riak and a $100K in-memory appliance to scale. And Ops is getting slaughtered — the cost of scaling poorly (and by poorly I mean pagers going off during dinner, bulk updates taking hours and failing all the time, fragmented and unmanageable indices consuming dozens of machines) is beginning to look like the cost of antiquated technology. Good Ops people are not fools. They look for ways to make life easier. Make no mistake — all the Devs and Ops folks came with a set of tough questions and a list of new features. They also came with an understanding that companies that release open source software still have a business to run. They are willing to spend on a real company. In fact, having a business behind Riak ended up mattering as much as any features.

So, I suspect, we are at the proverbial “end of the beginning.” Smart people in the NoSQL movement have succeeded in building convincingly good software and then explaining the virtues convincingly (all but one of the presentations at NoSQL East demonstrated the virtues of the respective approaches). Now these people are connecting to smart people responsible for building and running web apps, people who are decidedly unwilling to sit around hoping for Oracle or IBM to solve their problems.

In the new phase — which we will cleverly call the “beginning of the middle” — great tech will matter even more than it does now. It won’t be about selling or marketing or any of that. If our numbers are any indication of a larger trend, more people will download and install NoSQL databases in the next month than the combined total of the three months previous. More people in a buying frame of mind will evaluate NoSQL technology not in terms of its coolness but in terms of its ability to solve their real, often expensive problems. The next phase will be rigorous in a way this phase was not. People have created several entirely new ways to store and distribute data. That was the easy part.

Just as much as great tech, the people behind it will matter. That means more calls between us and Dev teams. That means more feature requests considered and, possibly, judiciously, agreed to.

That also means lots of questions answered. People care about support. They care about whether you answer their emails in a timely fashion and are polite. People want to do business with NoSQL. They want to spend money to solve problems. They need to know they are spending it with responsible, responsive, dedicated people.

Earl tweets about it all the time and I happen to agree: any NoSQL success helps all NoSQL players. I also happen to feel that any failure hurts all NoSQL players. As NoSQL rapidly ages into its adolescence, it will either be awkward and painful or exciting and characterized by incredible growth.

When I was a kid on the Navy base in Alameda, my babysitter watched soaps all afternoon, leaving me mostly to my own devices. If I stopped in, I always got roped in to hearing her explain her favorite stories. Most of all she loved how ridiculous they were, though she would never admit this exactly. Instead, adopting an attitude of gleeful incredulity, she would point out this or that attractive young actor and tell me how just a year ago, she was a little baby. “Soap people have to grow up quick, I guess,” was her single (and to her, completely satisfactory) explanation. “If they don’t, they get written out of the story.”

Indeed.

Best,

Tony Falco

A Few More Details On Why We Switched To GitHub

November 11, 2010

We announced recently on the Riak Mailing List that Basho was switching to git and GitHub for development of Riak and all other Basho software. As stated in that linked email, we did this primarily for reasons pertaining to community involvement in the development of Riak. The explanation on the Mailing List was a bit terse, so we wanted to share some more details to ensure we answered all the questions related to the switch.

Some History

Riak was initially used as the underlying data store for an application Basho was selling several years ago and, at that time, its development was exclusively internal. The team used Mercurial for internal projects, so that was the de-facto DVCS choice for the source.

When we open-sourced Riak in August 2009, being Mercurial users, we chose to use BitBucket as our canonical repository. At the time we open-sourced it, we were less concerned with community involvement in the development process than we are now. Our primary reason for open-sourcing Riak was to get it into the hands of more developers faster.

Not long after this happened, the questions about why we weren’t on GitHub started to roll in. Our response was that we were a Mercurial shop and BitBucket was a natural extension of that. Sometime towards the beginning of May we started maintaining an official mirror of our code on GitHub. This mirror was our way of acknowledging that there is more than one way to develop software collaboratively and that we weren’t ignoring the heaps of developers who were dedicated GitHub users and preferred to look at and work with code on this platform.

Some Stats

GitHub has the concept of “Watchers” (analogous to “Followers” on BitBucket). We started accumulating Watchers once this GitHub mirror was in place. “Watchers” is a useful, but not absolute, metric for measuring interest and activity in a project. They bring a tremendous amount of attention to any given project through their use of the code and their promotion of it. They also, in the best case scenario, will enhance the code in a meaningful way by finding bugs and contributing patches.

This table shows the week on week of growth of BitBucket Followers vs. GitHub Watchers since we put the official mirror in place:

BitBucket GitHub
Number of Followers/Watchers at Time of Switch 97 145
Avg. Week on Week Growth (%) 0.74 7.2

 

Since putting the official mirror in place, the number of Watchers on the GitHub repo for Riak has grown at steady ready, averaging just over 7% week on week. This far outpaced the less than 1% growth in Followers on the canonical Bitbucket repository for Riak.

With this information it was clear that Riak on GitHub as a mirror was bringing us more attention and driving more community growth than was our canonical repo on BitBucket. So, in the interest of community development, we decided that Riak needed to live on GitHub. What they have built is truly the most collaborative and simple-to-use development platform there is (at least one well-respected software analyst has even called it “the future of open source”). Though Mercurial was deeply ingrained in our development process, we were willing to tolerate the workflow hiccups that arose during the week or so it took to get used to git in exchange for the resulting increase in attention and community contributions.

The switch is already proving fruitful. In addition to the sharp influx in Watchers for Riak, we’ve already taken some excellent code contributions via GitHub. That said, there is much left to be written. And we would love for you to join us in building something legendary in Riak, whatever your distributed version control system and platform preference may be.

So when you get a moment, go check out Riak on Github, or, if you prefer, Riak on BitBucket. And if you have any more questions, feel free to email: mark@basho.com.

Mark

Where To Find Basho This Week

October 26, 2010

Basho is hosting one event this week and participating in another. Here are the details to make sure everyone is up to speed:

A NOSQL Evening in Palo Alto

Tonight there will be a special edition of the Silicon Valley NoSQL Meetup, billed as “A NOSQL Evening in Palo Alto.” Why do I say “special”? Because this month’s event has been organized by the one and only Tim Anglade as part of his NoSQL World Tour. And this is shaping up to be one of the tour’s banner events.

Various members of the Basho Team will be in attendance and Andy Gross, our VP of Engineering, will be representing Riak on the star-studded panel.

There are almost 200 people signed up to see this discussion as it’s sure to be action-packed and informative. If you’re in the area and can make it out on short notice, I would recommend you attend.

October San Francisco Riak Meetup

On Thursday night, from 7-9, we are holding the October installment of the San Francisco Riak Meetup. Like last month, the awesome team at Engine Yard has once again been gracious enough to offer us their space for the event.

We have two great planned talks for this month. The first will be Basho hacker Kevin Smith talking about a feature of Riak that he has had a major hand in writing: MapReduce. Kevin is planning to cover everything from design to new code demos to the road map. In short, this should be exceptional.

For the second half of Thurday’s meetup we are going to get more interactive than usual. Articulation of use cases and database applicability is still something largely unaddressed in our space. So we thought we would address it. We are inviting people to submit use cases in advance of the meetup with some specific information about their apps. The Basho Developers are going to do some work before the event analyzing the use cases and then, with some help from the crowd, determine if and how Riak will work for a given use case – and if Riak isn’t the right fit, we might even help you find one that is. If you are curious whether or not Riak is the right database for that Facebook-killer you’re planning to build, now is your chance to find out. We still have room for one or two more use cases, so even if you’re not going to be able to attend the Thursday’s meetup I want to hear from you. Follow the instructions on the meetup page linked above to submit a use case.

That said, if you are in the Bay Area on Thursday night and want to have some beer and pizza with a few developers who are passionate about Riak and distributed systems, RSVP for the event. You won’t be disappointed.

Hope to see you there!

Mark

Why I Am Excited About Riak Search

October 20, 2010

Last week Basho released Riak 0.13, including (among many other great improvements) the first public release of the much-anticipated Riak Search. There are a number of reasons why I am very excited by this.

The first and most obvious reason why Riak Search is exciting is that it’s an excellent piece of software for solving a large class of data retrieval needs. Of course, this isn’t the first search system that is clustered and can grow by adding servers; that idea isn’t groundbreaking or very exciting on its own. However, it is the first such search system that I know of with the powerful and robust systems model that people have come to treasure in the Riak key/value store.

Being able to grow & shrink and handle failures in an easy and predictable way is much less common in search systems than in ”simpler” data management systems. After we demonstrated that we could build something (anything!) with that kind of easy scalability and availability, our friends and customers began to ask if we could apply our ideas to some richer data models. Specifically, a common pattern began to emerge: people would deploy Riak and an indexing system such as Apache Solr system side-by-side. This was a workable solution, but could be operationally frustrating. The systems could get out of sync, capacity planning became more complicated, and most importantly the operations story around failure management became much more challenging. If only we could make the overall collection of systems as good at availability management as Riak was, then a new class of problems would be solved.

Those conversations began a journey of exploration and experimentation. This essential phase was led by John Muellerleile, one of the most creative and resourceful programmers I know. He did all of the early work, finding places where the state of the art in indexing and search could be merged with the “Riak Way” of doing things. More recently, amazing work was done by the entire Basho Engineering team to make Riak Search move from prototype to product.

Riak Search has only been out for about a week, but users are already discovering that they can use it in more than one way; it can function in fulltext-analysis, as an easy way to produce simple inverted indices over semi-structured data, and more.

That’s enough reason to be excited about Riak Search, but I have bigger reasons as well.

Riak Search is the first public demonstration that Riak Core is a meaningful base on which to build distributed systems beyond just a key/value store. By using the same central code base for distribution, dispatch, ownership, failure management, and node administration, we are able to confidently make many of the same guarantees for Search that we have made all along for Riak’s key/value storage. Even if Search itself wasn’t such a compelling product, it is exciting as a proof of the value of Riak Core.

That value hints at Riak’s future — not as a single database but as a family of distributed systems for storing, managing, and retrieving data. We’ve now gone from one to two such systems, but we’re not stopping there. The work of creating Search was really two efforts: building Search itself and also the breakout & improvement of our Core. We can (and will) use that improved Core to build new systems in the future.

The subtlest, but perhaps most important, of the exciting things about Search is that it also uses Core to show how each new Riak system is greater than the sum of its parts. Riak Search is not just a search system using the same Core codebase as KV, it is running on the same actual nodes as KV. This allows us to develop features that don’t make sense in KV alone or in Search alone, but that take advantage of the shared running elements of Core. For instance, users can issue a search/map/reduce query that runs map/reduce style parallel processing with data locality on a dataset determined by a search result. As we develop further systems on Riak Core, we expect further such connections to make each one also benefit the entire Riak family in this way.

What we have released in the recent past is exciting. The future is even more exciting.

Justin