July 21, 2010
A few members of the Basho Team are at OSCON all week. We are here to take part in the amazing talks and tutorials, but also to talk to Riak users and community members.
Yesterday I had the opportunity to have a brief chat with Andrew Harvey, a developer who hails from Sydney, Australia and works for a startup called Lexer. They are building some awesome applications around brand monitoring and analytics, and Riak is helping in that effort.
In this short clip, Andrew gives me the scoop on Lexer and shares a few details around why and how they are using Riak (and MySQL) at Lexer.
(Deepest apologies for the shakiness. I forgot the Tripod.)
July 19, 2010
Basho is growing. Fast. We are adding customers and users at a frenetic pace, and with this growth comes expansion in both team and locations. As some of you may have noticed, the Basho Team is not only becoming larger but more distributed. We now have people in six states scattered across four time zones pushing code and interacting with clients everyday.
First Order of Business
To bolster this growth and expansion, we did what any self-respecting tech startup would do: we opened an office in San Francisco. Several members of the Basho Team recently moved into a space at 795 Folsom, a cozy little spot a mere five floors below Twitter. (Proximity to the Nest was a requirement when evaluating office space.) We are calling it “Basho West.” There are four of us here, and we are settling in quite nicely.
If you are in the area and want to talk Riak, Basho, open source, coffee, etc., stop in and pay us a visit any time. Seriously. If you walk through the door of Suite 1028 with a Mac Book in hand and have a question about how to model your data in Riak, we’ll get out the whiteboard and help you out.
Second Order of Business
To make an immediate impact in the Bay Area, we thought it would be a great idea to get the first regularly scheduled Riak Meetup off the ground. We heard a rumor that there were a lot of people using or interested in databases out here, so we feel obliged to join the conversation. Here is the link to the San Francisco Riak Meetup group. If you’re in the Bay Area and want to meet with other like-minded developers and technologists to discuss Riak (and other database technologies) in every possible capacity, please join us.
Third Order of Business
Pop quiz: When did Basho Technologies open source Riak? We asked ourselves this the other day. As far we can tell, it was sometime during the first week and a half of August last year. “Huh,” we thought. “Wouldn’t it be great to have a little gathering to commemorate this event?” It sure would, so that’s what we are doing.
I mentioned above that we are starting a regularly scheduled Riak Meetup. To us, it made perfect sense to combine the inaugural Meetup with the event to celebrate Riak’s One Year Anniversary of being a completely open source technology.
The date of this gathering is Monday, August 9th. The exact time and location still needs to be solidified. We’ll be announcing that within the next few days. But put it on your calendar now, as you will not want to miss this. In addition to food, drink, and exceptional overall technical discussion and fireworks, here is what you can expect:
- A talk from Dr. Eric Brewer, Basho Board Member and Father of the CAP Theorem
- A few words from the team at Mochi Media about their experiences running Riak in production
- A short talk from Basho’s VP of Engineering, Andy Gross, on the state of Riak and the near term road map
If you have any other suggestions about what you would like to see at this event, just leave us a message or an idea on the Meetup page linked above.
- Come visit the new Basho Office at 795 Folsom, Suite 1028
- Join the Riak Meetup Group
- Come be a part of the Riak One Year Anniversary Celebration
And stay tuned, because things are only going to get more exciting from here.
July 16, 2010
Basho is sending some team members to Portland to take part in the two great events happening up there over the next week. Antony Falco, Mark Phillips (that’s me) and John Hornbeck will be in “Stumptown” starting today for the Community Leadership Summit and OSCON. (We’ll be landing at around 9PST if you want to meet us at PDX with welcome signs.)
If you would like to meet-up or want to say “hi” leave a comment, message us on Twitter, or email firstname.lastname@example.org.
We’ll have shirts and stickers with us, too, so if you would like to get your hands on some Riak swag make sure to get in touch. I’ll also be staggering around with a video camera, looking to interview anyone who has used or ever thought about using Riak or any other piece of Basho software. Users beware…
See you there!
June 14, 2010
Erlang Factory London gathers Erlang pioneers from across the world—Berlin to Boston, Krakow to Cordoba, and San Francisco to Shanghai—for a two-day conference of innovative Erlang development.
The summaries below are just a small sampling of the talks at Erlang Factory London. There were three tracks running back-to-back for two days, and I often couldn’t decide which of the three to attend. Slides and videos will be released by Erlang Solutions, and can be found under individual track pages on the Erlang Factory website.
Day 1 – June 10, 2010
Francesco Cessarini (Chief Strategy Officer, Erlang Solutions Ltd.), began the conference with a warm welcome and a quick review of progress made by Erlang-based companies in the last year.
- LShift (RabbitMQ, built in Erlang) acquired by VMWare
- Couchio formed and funded to develop and support CouchDB
- MochiMedia (Flash game ad network using Erlang stack), acquired by Shanda Games
- Basho received two rounds of funding to develop and support Riak
- Klarna received a round of investment from Sequia Capital, and Michael Moritz previous board member of Google, Yahoo, and Paypal, joined the Klarna board.
The History of the Erlang Virtual Machine – Joe Armstrong, Robert Virding
Joe Armstrong and Robert Virding gave a colorful, back-and-forth history of the Erlang’s birth and early years. A few notable milestones and achievements:
- Joe’s early work on reduction machines. Robert’s complete rewrite of Joe’s work. Joe’s complete rewrite of Robert’s work. (etc.)
- How Erlang was almost based on Smalltalk rather than Prolog
- The quest to make Erlang 1.0x 80 times faster
- Experiments with different memory management and garbage collection schemes
- The train set used demonstrate Erlang, now in Robert’s basement
- The addition of linked processes, distribution, OTP, and bit syntax
It’s easy to take a language like Erlang for granted and assume that its builders followed some well-known, pre-ordained path. Hearing Erlang’s history from two of its main creators provided an excellent reminder that building software is both an art and a science, uncertain and exciting like any creative process.
Riak from the Inside – Justin Sheehy
Justin Sheehy (CTO of Basho Technologies) opened his talk by introducing Riak, “a scalable, highly-available, networked, open-source key/value store.” He then very quickly announced that he wasn’t there to talk about using Riak, he was there to talk about how Riak was built using Erlang and OTP
There are eight distinct layers involved in reading/writing Riak data:
- The Client Application using Riak
- The client-side HTTP API or Protocol Buffers API that talks to the Riak cluster
- The server-side Riak Client containing the combined backing code for both APIs
- The Dynamo Model FSMs that interact with nodes using Dynamo style quorum behavior and conflict resolution
- Riak Core provides the fundamental distribution of the system (not covered in the talk)
- The VNode Master that runs on every physical node, and coordinates incoming interaction with individual VNodes
- Individual VNodes (Virtual Nodes) which are treated as lightweight local abstractions over K/V storage
- The swappable Storage Engine that persists data to disk
During his talk, Justin discussed each layer’s responsibilities and interactions with the layers above and below it.
Justin’s main point is that carefully managed complexity in the middle layers allows for simplicity at the edge layers. The top three layers present a simple key/value interface, and the bottom two layers implement a simple key/value store. The middle layers (FSMs, Riak Core, and VNode Master) work together to provide scalability, replication, etc. Erlang makes this possible, and was chosen because it provides a platform that evolves in useful and relatively-predictable ways (this is a good thing, a surprising evolution is bad).
Mnesia for the CAPper – Ulf Wiger
Ulf Wiger (CTO of Erlang Solutions) discussed where Mnesia might fit into the changing world of databases, given the new focus on “NoSQL” solutions. Ulf gave a quick introduction to ACID properties, Brewer’s CAP theorem, and the history of Mnesia, and then dove into a feature level description/comparison of Mnesia with other databases:
- Deployed commercially for over 10 years
- Comparable performance to current top performers clustered SQL space
- Scalable to 50 nodes
- Distributed transactions with loose time limits (in other words, appropriate for transactions across remote clusters)
- Built-in support for sharding (fragments)
- Incremental backup
The downsides are:
- Erlang only interface
- Tables limited to 2GB
- Deadlock prevention scales poorly
- Network partitions are not automatically handled, must recombine tables automatically
Ulf and others have done work to get around some of these limitations. Ulf showed code for an extension to Mnesia that automatically merges tables after they have split, using vector clocks.
Riak Search – Rusty Klophaus
I presented Riak Search, a distributed indexing and full-text search engine built on (and complementary to) Riak.
Part one covered the main reason for building Riak search: clients have built applications that eventually need to find data by value, not just by key. This is difficult, if not impossible, in a key/value store.
Part two described the shape of the final solution we set out to create. The goal of Riak Search is to support the Lucene interface, with Lucene syntax support and Solr endpoints, but with the operations story of Riak. This means that Riak Search will scale easily by adding new machines, and will continue to run after machine failure.
Part three was an introduction to Inverted Indexing, which is the heart of all search systems, as well as the difference between Document-Partitioning and Term-Partitioning, which forms the ongoing battle in the distributed search field. Part three continued with a deep-dive into parsing, planning, and executing the search query on Erlang.
Building a Scalable E-commerce Framework – Michael Nordström and Daniel Widgren
Michael Nordström and Daniel Widgren presented an Erlang-based e-commerce framework on behalf of their project team from Uppsala University (Christian Rennerskog, Shahzad Gul, Nicklas Nordenmark,
Manzoor Ahmad Mubashir, Mikael Nordström, Kim Honkaniemi, Tanvir Ahmad, Yujuan Zou, and Daniel Widgren) and their industrial partner, Klarna AB.
The application uses a “LERN stack” (Linux, Erlang, Riak, Nitrogen), to provide a reusable web shop that can be quickly set up by clients, customized via templates and themes, and extended via plugins to support different payment providers.
The project is currently going a rewrite to update to the latest versions of Riak and Nitrogen.
Clash of the Titans: Erlang Clusters and Google App Engine – Panos Papadopoulos, Jon Vlachoyiannis, Nikos Kakavoulis
Panos, Jon, and Nikos took turns describing the technical evolution of their startup, SocialCaddy, and why they were forced to move away from the Google App Engine. SocialCaddy is a tool that mines your online profiles for important events and changes, and tells you about them. For example, if a friend gets engaged, SocialCaddy will tell you about it, and assist you in sending a congratulatory note.
Google App Engine imposes a 30 second limit on requests. As SocialCaddy processed larger and larger social graphs, they bumped into this limit, which made GAE unusable as a platform. In response, the team developed Erlust, which allows you to submit jobs (written in any language) to a cluster. An Erlang application coordinates the jobs, and each job should read from a queue, process messages, and write to another queue.
Using Open-Source Trifork QuickCheck to test Erjang – Kresten Krab Thorup
Kresten Krab Thorup (CTO of Trifork) stirred up dust when he originally announced his intention to build a version of Erlang that ran on the JVM. Since then, he has made astounding progress. Erjang turns Erlang .beam files into Java .class files, now supporting a broad enough feature set to run Mnesia over distributed Erlang. Kresten claimed performance matching (or at times exceeding) that of the Erlang VM.
Erjang is still a work in progress, there are many BIFs that still need to be ported, but if a prototype exists to prove viability, then this prototype was certainly a success. One slide showed the code for the spawn_link function reimplemented in Java in ~15 lines of simple Java code.
For the second half of his talk, Kresten showed off Triq (short for Trifork Quickcheck), a scaled-down, open-source QuickCheck inspired testing framework that he built in order to test Erjang. Triq supports basic generators (called domains), value picking, and shrinking. Kresten showed examples of using Triq to validate that Erjang performs binary operations with the exact same results as Erlang.
More information about Erjang here: http://wiki.github.com/krestenkrab/erjang/
Day 2 – June 11, 2010
Efene: A Programming Language for the Erlang VM – Mariano Guerra
Mariano Guerra presented Efene, a new language that is translated into Erlang source code. Efene is intended to help coax developers into the world of Erlang who might otherwise be intimidated by the Prolog-inspired syntax of Erlang. We’ve heard about a number of other projects compiling into Erlang byte-code (such as Reia and Lisp-Flavored Erlang), but Efene takes a different approach in that the language is parsed and translated using Leex and Yecc into standard Erlang code, which is then compiled as normal. By doing this, Mariano manages to leave most of the heavy lifting of optimizations to the existing Erlang compiler.
In some places, Efene syntax is a bit more verbose than Erlang. This is done to make the language more readable than Erlang. (“if” and “case” statements have more structure in Efene than Erlang.) In other places, Efene requires less typing, multi-claused function definitions don’t require you to repeat the function name, for example.
Code samples and more information: http://marianoguerra.com.ar/efene
Erlang in Embedded Systems – Gustav Simonsson, Henrik Nordh, Fredrik Andersson, Fabian Bergstrom, Niclas Axelsson and Christofer Ferm
Gustav, Henrik, Fredrik, Fabian, Niclas, and Christofer (Uppsala University), in cooperation with Erlang Solutions, worked on a project to shrink the Erlang VM (plus the Linux platform on which it runs) down to the smallest possible footprint for use on Gumstix and BeagleBoard hardware.
The team experimented with OpenEmbedded and Angstrom, using BusyBox, uClibc, and stripped .beam files to further decrease the footprint. During the presentation, they played a video showing how to install Erlang on a Gumstix single-board computer in 5 minutes using their work.
More information about Embedded Erlang here: http://embedded-erlang.org
Zotonic: Easy Content Management with Erlang’s Performance and Flexibility – Marc Worrell
Marc Worrell (WhatWebWhat) breaks CMSs into:
- 1st Generation – Static text and images
- 2nd Generation – Database- and template-driven systems (covers current CMS systems)
- 3rd Generation – Highly interactive, real-time, personalized data exchanges and frameworks
Zotonic is aimed squarely at the third generation, Zotonic turns a CMS into a living, breathing thing, where modules on a page talk to each other and other sessions via comet, and the system can be easily extended, blurring the line between CMS and application framework.
This interactivity is what motivated Marc to write the system in Erlang; at one point he compared the data flowing through the system to a telephone exchange. Zotonic uses Webmachine, Mochiweb, ErlyDTL, and a number of other Erlang libraries, with data in PostgreSQL. (Marc also mentioned Nitrogen as an early inspiration for Zotonic, parts of Zotonic are based on Nitrogen code, though much has been rewritten.)
The data model is physically simple, with emergent functionality. A site is developed in terms of objects (called pages) interlinked with other objects. In other words, from a data perspective, adding an image to a web page is the same as linking from a page to a subpage, or tagging a page with an author. Mark gave a live demo of Zotonic’s ability to easily add and change menu structures, modify content, and add and re-order images. Almost everything can be customized using ErlyDTL templates. Very polished stuff.
Marc then introduced his goal of “Elastic Zotonic”, a Zotonic that can scale in a distributed, fault-tolerant, “buzzword-compliant” way, which will involve changes to the datastore and some of the architecture.
Marc is now working with Maximonster to develop an education-oriented social network on top of Zotonic.
More information: http://zotonic.com
Francesco (CSO, Erlang Solutions, Ltd.) thanked the sponsors, presenters, and audience. Frank then gave a big special thanks to Frank Knight and Joanna Włodarczyk, who both worked tirelessly to organize the conference and make everything go smoothly.
Erlang is gaining momentum in the industry as a platform that enables you to solve distributed, massively concurrent problems. People aren’t flocking directly to Erlang itself, they are instead flocking to projects built in Erlang, such as RabbitMQ, ejabberd, CouchDB, and of course, Riak. At the same time, other languages are adopting some of the key features that make Erlang special, including a message-passing architecture and lightweight threads.
June 6, 2010
Velocity, the “Web Performance and Operations Conference” put on by O’Reilly, kicks off tomorrow and we here at Basho are excited. Why? Because Benjamin Black, acknowledged distributed systems expert, will be giving a 90 minute tutorial on Riak. The official name of the session is called “Riak: From Design to Deploy.” If you haven’t already read it, you can get the full description of the session here.
I just got a sneak peek at what Benjamin has planned and all I can say is that if you are within 100 miles of Santa Clara, CA tomorrow and not in this session, you will regret it.
And, what better to go with a hands on Riak tutorial than some good old fashioned SWAG? Here is a little offer to anyone attending tomorrow: post a write up of Benjamin’s session and I’ll send you a Riak SWAG pack. It doesn’t have to be a novel, just a few thoughts will do. Post them somewhere online for all the world to see and learn from, and I’ll take care of the rest.
Enjoy Velocity. We are looking forward to your reviews!
May 21, 2010
This post is going to start by explaining how in-the-trenches experience with key/value stores, like Riak, led to the creation of Riak Search. Then it will tell you why you care, what you’ll get out of Riak Search, and why it’s worth waiting for.
A bit of history
Few people know that Basho used to develop applications for deployment on Salesforce.com. We had big goals, and were thinking big to fill them, and part of that was choosing a data storage system that would give us what we needed not only to succeed and grow, but to survive — a confluence of pragmatism and ideal that embodied a bulletproof operations story, a path upward — resilience, reliability, and scalability, through the use of proven science.
So, that’s what we did: we developed and used what has grown to be, and what you know today, as Riak.
Idealism can’t get you everywhere, though. While we answered hard questions with link-walking and map/reduce, there was still the desire in the back of all of our heads: sometimes you just want to ask, “What emails were sent on May 21 that included the word ‘strategy’?” without having to figure out how to walk links from an organizational chart to mailboxes to mails, and then filter over the data there. It was a pragmatic desire: we just wanted a quick answer in order to decide whether or not to spend more time chasing a path. “Less yak-shaving, please.”
The Operations Story
Then we stopped making Salesforce.com apps, and started selling Riak. We quickly found the same set of desires. Operationally, Riak is a huge win. Pragmatically, something that does indexing and search in a similar operational manner is even bigger. Thus, Riak Search was born.
The operational story is, in a nutshell, this: when you add another node to your cluster, you add capacity and compute power. That’s it, you just add another box and “it just works.” Purposefully or not, eventually a node leaves the cluster, hardware fails, whatever: Riak deals with it. If the node comes back, it’s absorbed like it never left.
We insisted on these qualities for Riak, and have continued that insistence in Riak Search. We did it with all the familiar bits: consistent hashing, hinted handoff, replication, etc.
Why Riak Search?
Now, we’ll be the first to tell you that with Riak you can get pretty far using link-walking and map/reduce, with the understanding that you know what you are going to want ahead of time, and/or are willing to wait for it.
Riak Search answers questions that pop into your head; “find me all the blue dresses that are between $20 and $30 dollars,” “find me the document Bob referred to last week at the TPS procedures meeting,” “how can I delete all these emails from my aunt that have those stupid attachments?” “find me that comic strip with Bob,” etc.
It’s about making members of the sea of data in your key-value store findable. At a higher level, it’s about agility. The ability to answer questions you have about your business and your customers without having to consult a developer or dig through reference manuals and without your application developers having to reinvent the wheel with a very real possibility of doing it just right enough to assure you nothing will go wrong. It’s about a common indexing language.
Okay, now you know — thanks for bearing with us — let’s get to the technical bits.
Riak Search …
The system we have built …
- is an easily-scalable, fault-tolerant search and indexing system, adhering to the operational story you just read
- supports full-text indexing and search
- allows querying via the Lucene query syntax
- has Solr-compatible /select and /update web-services
- supports date and numeric indexing
- supports faceting
- automatically distributes indexes
- has an intermediate query language and integrated query planner
- supports scoring
- has integrated tokenizing, filtering and analysis (yes, you can
… and much more. Sounds pretty great, right?
If you want to know more about the internals and technical nitty gritty, check out the Riak Search presentation one of our own, Riak Search engineer John Muellerleile, gave at the San Francisco Erlang Factory this year.
So, why don’t you have it yet? The easy part.
There are still some stubs and hard-coded things in what we have. For instance, the full-text analyzer in use is just whitespace, case-normalization, and stop-word filtering. We intend to fully support the ability to specify other Lucene analyzers, including custom modules, but the code isn’t there yet.
There is also very little documentation. Without a little bit of handholding, even the brightest and most ambitious user could be forgiven for staring blankly, lost for even the first question to ask. We’re spreading the knowledge among our own team right now; that process will generate the artifacts needed for the next round of users to step in.
There are also many fiddly, finicky bits. These are largely relics of early iterations. Rather than having the interwebs be flooded with, “How do you stop this thing?” (as it was with Riak), we’re going to make things friendlier.
So, why don’t you have it yet? The not-so-easy part.
You’ve probably asked yourself, “What of integration of Riak and Riak Search?” We have many notes from discussions about how it could or should be done, as well as code showing how it can be done. But, we’re not completely satisfied with any of our implementations so far.
There are certainly no shortage of designs and ideas on how this could or should work, so we’re going to make a final pass at refining all of our ideas, given our current Riak Search system to play with, so that we can provide a solid, extensible system, instead of one that with many rough edges that would almost certainly be replaced immediately.
Furthering this sentiment is that we think that our existing map/reduce framework and the functionality and features provided by Riak Search are a true power combo when used together intelligently, than simply as alternatives, or at worse, at odds. As a result, we’re defining exactly how Riak Search indexing and querying should be threaded into Riak map/reduce processing to bring you a combination that is undoubtedly more than the sum of its parts.
We could tease you with specifics, like generating the set of bucket/key inputs to a map phase by performing a Riak Search query, or parameterizing Search phases with map results; though, for now, amidst protest both internally — we’re chomping at the bit to get this out into the world and into your hands – and externally, as our favorite people continually request this exact set of technology and features, we’re going to implement the few extra details from our refined notes before forcing it on you all.
Hold on just a little longer.
-the Riak Search Team
May 4, 2010
There is nothing easy about making software simple to learn and understand. Every potential user has different nuances to their learning styles, and this makes for a hard road to simple usage. This is especially true with Riak.
Internally at Basho, we are constantly addressing questions like, “How do we make a ‘distributed, Dynamo-inspired key/value store’ inviting and less daunting to first time users?” and “How do we lower the barrier to adoption and usage?” Though resources like the Riak Mailing List, the Riak Wiki, and Riak IRC channel are great, we kept asking ourselves, “What can we do to make it dead simple for those new to and interested in Riak to learn about it and how it works?”
Our answer (in part) is the Riak Fast Track.
What is the Riak Fast Track?
The Fast Track is an interactive module on the Riak Wiki that, through a combination of concise content and brief screencasts, will bring you up to speed on a) what Riak is, b) what its key features and benefits are, and c) how to use it.
As I stated above, the Fast Track is aimed at developers who may be new to Riak or those who may have heard about it in passing but haven’t spent too much time fiddling with it.
We put a lot of time into making this, but there are undoubtedly some kinks that need to be worked out. And, regardless of how long we try to tweak and refine it, there will always be some small aspects and details that we aren’t going to get right. It is for that reason that we are appealing to you, the rapidly-growing Riak Community, to help us.
So, here is the challenge: Take 45 minutes and go through the Riak Fast Track. Then, when you’re done, take five minutes to write us an email and tell us what you thought about it. That’s it.
We are looking for answers to questions like:
- Was it effective?
- Did you learn anything?
- What did we get right?
- What did we get wrong?
- What should we add/remove?
And, to sweeten the pot, we are going to send a “Riak Swag Pack” (contents of which are top secret) to everyone who sends us their review and thoughts on the Fast Track by the close of business on Tuesday (5/11) of next week. It doesn’t have to be anything extensive (though we love details). A simple, “I liked x, y, and z, but you could have done this better” would suffice. You can send your emails to email@example.com. I am looking forward to hearing from you!
So, without further ado, go forth and test out the Riak Fast Track.
We hope you’ll find it useful and we’re looking forward to your thoughts on how to make it better.
A Muddle That Slows Adoption
Basho released Riak as an open source project seven months ago and began commercial service shortly thereafter. As new entrants into the loose collection of database projects we observed two things:
- Widespread Confusion — the NoSQL rubric, and the decisions of most projects to self-identify under it, has created a false perception of overlap and similarity between projects differing not just technically but in approaches to licensing and distribution, leading to…
- Needless Competition — driving the confusion, many projects (us included, for sure) competed passionately (even acrimoniously) for attention as putative leaders of NoSQL, a fool’s errand as it turns out. One might as well claim leadership of all tools called wrenches.
The optimal use cases, architectures, and methods of software development differ so starkly even among superficially similar projects that to compete is to demonstrate a (likely pathological) lack of both user needs and self-knowledge.
This confusion and wasted energy — in other words, the market inefficiency — has been the fault of anyone who has laid claim to, or professed designs on, the NoSQL crown.
- Adoption suffers — Users either make decisions based on muddled information or, worse, do not make any decision whatsoever.
- Energy is wasted — At Basho we spent too much time from September to December answering the question posed without fail by investors and prospective users and clients: “Why will you ‘win’ NoSQL?”
With the vigor of fools, we answered this question, even though we rarely if ever encountered another project’s software in a “head-to-head” competition. (In fact, in the few cases where we have been pitted head-to-head against another project, we have won or lost so quickly that we cannot help but conclude the evaluation could have been avoided altogether.)
The investors and users merely behaved as rational (though often bemused) market actors. Having accepted the general claim that NoSQL was a monolithic category, both sought to make a bet.
Clearly what is needed is objective information presented in an environment of cooperation driven by mutual self-interest.
This information, shaped not by any one person’s necessarily imperfect understanding of the growing collection of data storage projects but rather by all the participants themselves, would go a long way to remedying the inefficiencies discussed above.
Demystification through data, not marketing claims
We have spoken to representatives of many of the emerging database projects. They have enthusiastically agreed to participate in a project to disclose data about each project. Disclosure will start with the following: a common benchmark framework and benchmarks/load tests modeling common use cases.
- A Common Benchmark Framework — For this collaboration to succeed, no single aspect will impact success or failure more than arriving at a common benchmark framework.
At Basho we have observed the proliferation of “microbenchmarks,” or benchmarks that do not reflect the conditions of a production environment. Benchmarks that use a small data set, that do not store to disk, that run for short (less than 12 hours) durations, do more to confuse the issue for end users than any single other factor. Participants will agree on benchmark methods, tools, applicability to use cases, and to make all benchmarks reproducible.
Compounding the confusion is when benchmarks are used for comparison of different use cases or was run on different hardware and yet compared head-to-head as if the tests or systems were identical. We will seek to help participants run equivalent on the various databases and we will not publish benchmark results that do not profile the hardware and configuration of the systems.
- Benchmarks That Support Use Cases — participants agree to benchmark their software under the conditions and with load tests reflective of use cases they commonly see in their user base or for which they think their software is best suited.
- Dissemination to third-parties — providing easy-to-find data to any party interested in posting results.
- Honestly exposing disagreement — Where agreement cannot be reached on any of the elements of the common benchmarking efforts, participants will respectfully expose the rationales for their divergent positions, thus still providing users with information upon which to base decisions.
There is more work to be done but all participants should begin to see the benefits: faster, better decisions by users.
We invite others to join, once we are underway. We, and our counterparts at other projects, believe this approach will go a long way to removing the inefficiencies hindering adoption of our software.
CAMBRIDGE, MA – March 30 – Erlang Solutions, Ltd., the leading provider of Erlang services, and Basho Technologies, the makers of Riak, a high-availability data store written in Erlang, today announced they have entered into a multi-faceted partnership to deliver scalable, fault tolerant applications built on Riak, to the global market. Erlang OTP, an Ericsson open source technology, has found application in a new generation of fault tolerant use from companies like Facebook and Amazon.
“Erlang Solutions constantly seeks out new technologies and services to bring to its clients,” said Marcus Taylor, CEO of Erlang Solutions. “With Basho, we add not just the Riak software, but a partner to help us build an ecosystem of Erlang-based high-availability applications.”
The partnership has three major components: 1) both companies develop and support training and certification, 2) Erlang Solutions architects, load tests, and builds Riak-based applications for clients, and 3) Erlang Solutions provides Basho clients with training, systems design, prototyping, and development services.
“Erlang Solutions is globally recognized as the business thought leaders in the Erlang community,” said Earl Galleher, Basho Technologies Chairman and CEO. “Our current and future customers now have access to a new tier of professional services and we help Erlang Solutions push Erlang further into the mainstream market.”
With this partnership, Erlang Solutions now represents Basho Technologies in Europe for services and distribution. The teams will focus on high growth markets like mobile telephony, social media and e-commerce applications where current Relational Database Systems (RDBMS) solutions are struggling to keep up.
“Look at the explosive growth of SMS IM traffic,” said Francesco Cesarini, founder of Erlang Solutions, “and the cost to scale traditional infrastructure. Basho’s Riak helps clients contain these costs while increasing reliability. An ecosystem of high-availability solutions is forming and the relationship between Erlang Solutions and Basho Technologies will soon expand to include other partners and richer solutions.”
About Erlang Solutions Ltd
Founded in 1999, Erlang Solutions Ltd. www.erlang-solutions.com is an international company specialised in the Open Source language -Erlang and its middleware OTP. Erlang Solutions solves all your Erlang needs – Training, Certification, Consulting, Contracting, System Development, Support and Conferences. Erlang Solutions expert and certified consultants are the most experienced anywhere, with many having used Erlang since its early days. With offices in the UK, Sweden and Poland and clients on six continents, Erlang Solutions is available for short and long term opportunities world-wide.
About Basho Technologies
Basho Technologies, Inc., founded in January 2008 by a core group of software architects, engineers, and executive leadership from Akamai Technologies, Inc. (Nasdaq:AKAM – News), is headquartered in Cambridge, Massachusetts. Basho produces Riak, a distributed data store that combines extreme fault tolerance, rapid scalability, and ease of use. Designed from the ground up to work with applications that run on the Internet and mobile networks, Riak is particularly well-suited for users of cloud infrastructure such as Amazon’s AWS and Joyent’s Smart platform and is available in both an open source and a paid commercial version. Current customers of Riak include Comcast Corporation, MIG-CAN, and Mochi Media.
CEO, Basho Technologies, Inc.
Just a few days ago, we did something a bit new at Basho. We posted the beginning of a public discussion to explore and document some differences between various NoSQL systems. Some people have attempted such comparisons before, but generally from an external observer’s point of view. When something like this comes from a producer of one of the systems in question it necessarily changes the tone.
If you weren’t really paying attention you could choose to see this as aggressive competition on our part, but the people that have chosen to engage with us have hopefully seen that it’s the opposite: an overt attempt at collaboration. While the initial comparisons were definitely not complete (for instance, in some cases they reflected the current self-documented state of systems instead of the current state of their code) they nonetheless very much had the desired effect.
That effect was to create productive conversation, resulting both in improvement of the comparison documents and in richer ongoing communication between the various projects out there. Our comparisons have already improved a great deal as a result of this and will continue to do so. I attribute this to the constructive attention that they have received from people deeply in the trenches with the various systems being discussed. That attention has also, I hope, given us a concrete context in which to strengthen our relationships with those people and projects.
Some of the attention we received was from people that are generally unhelpful; there are plenty of trolls on the internet who are more interested in throwing stones than in useful conversation. There’s not much point in wading into that kind of a mess as everyone gets dirty and nothing improves as a result. Luckily, we also received attention from people who actually build things. Those sorts of people tend to be much more interested in productive collaboration, and that was certainly the case this time. Though they’re by no means the only ones we’ve been happy to talk to, we can explicitly thank Greg Arnette, Jonathan Ellis, Benjamin Black, Mike Dirolf, Joe Stump, and Peter Neubauer for being among those spending their valuable time talking with us recently.
It’s easy to claim that any attempt to describe others that isn’t immediately perfect is just FUD, but our goal here is to help remove the fear, uncertainty, and doubt that people outside this fledgling community already have about all of this weird NoSQL stuff. By engaging each other in direct, honest, open, civil conversations we can all improve our knowledge as well as the words we use to describe each others’ work.
The people behind the various NoSQL systems today have a lot in common with the American craft brewers of the 1980s and 1990s. (Yes, I’m talking about beer.) You might casually imagine that people trying to sell similar products to the same market would be cutthroat competitors, but you’d be wrong. When you are disrupting a much larger established industry, aggression amongst peers isn’t a good route to success.
The friend who convinced me to try a Sam Adams in 1993 wasn’t striking a blow against Sierra Nevada or any of the other craft brewers at the time. In fact, he was helping all of those supposed “competitors” by opening up one more pair of eyes to the richness of choices that are available to all. People who enjoy good beer will happily talk about the differences between their brews all day, but in the end what matters is that when they walk into a bar they will look to see what choices they have at the tap instead of just ordering the same old Bud without a second thought.
Understanding that “beer” doesn’t always mean exactly the same identical beverage is the key realization, just as with NoSQL the most important thing people outside the community can realize is that not all data problems are shaped like a typical RDBMS.
Of course, any brewer will talk more about their own product more than anything else, but will also know that good conversations lead to improvements by all and the potential greater success of the entire community they exist in. Sometimes the way to start a good conversation is to talk about what you know best, with people that you know will have a different point of view than your own. From there, everyone’s knowledge, perspective, and understanding can improve.
At Basho we’re not just going to keep doing what we’ve already done in terms of communication. We’re going to keep finding new and better ways of communicating, and do it more often.
In addition to continuing to work with others on finding the right ways to publicly discuss our differences and similarities on fundamentals, we will also do so in specific areas such as performance testing, reliability under duress, and more. This will remain tricky, because it is easy for people to get confused by superficial issues and distracted from the more interesting ones — and opinions will vary on which are which. In discussing those opinions, we will all become more capable practitioners and advocates of the craft that binds us together.
Ruffling a few feathers is a low cost to pay, if better conversations occur. This is especially true if the people truly creating value by building systems learn how to work better together in the process.