Tag Archives: Basho

Erlang Factory London Recap

June 14, 2010

This was originally posted by @rklophaus on his blog, rklophaus.com.

Erlang Factory London gathers Erlang pioneers from across the world—Berlin to Boston, Krakow to Cordoba, and San Francisco to Shanghai—for a two-day conference of innovative Erlang development.

The summaries below are just a small sampling of the talks at Erlang Factory London. There were three tracks running back-to-back for two days, and I often couldn’t decide which of the three to attend. Slides and videos will be released by Erlang Solutions, and can be found under individual track pages on the Erlang Factory website.

Day 1 – June 10, 2010

Opening Session

Francesco Cessarini (Chief Strategy Officer, Erlang Solutions Ltd.), began the conference with a warm welcome and a quick review of progress made by Erlang-based companies in the last year.

Some highlights:

The History of the Erlang Virtual Machine – Joe Armstrong, Robert Virding

Joe Armstrong and Robert Virding gave a colorful, back-and-forth history of the Erlang’s birth and early years. A few notable milestones and achievements:

  • Joe’s early work on reduction machines. Robert’s complete rewrite of Joe’s work. Joe’s complete rewrite of Robert’s work. (etc.)
  • How Erlang was almost based on Smalltalk rather than Prolog
  • The quest to make Erlang 1.0x 80 times faster
  • Experiments with different memory management and garbage collection schemes
  • The train set used demonstrate Erlang, now in Robert’s basement
  • The addition of linked processes, distribution, OTP, and bit syntax

It’s easy to take a language like Erlang for granted and assume that its builders followed some well-known, pre-ordained path. Hearing Erlang’s history from two of its main creators provided an excellent reminder that building software is both an art and a science, uncertain and exciting like any creative process.

Riak from the Inside – Justin Sheehy

Justin Sheehy (CTO of Basho Technologies) opened his talk by introducing Riak, “a scalable, highly-available, networked, open-source key/value store.” He then very quickly announced that he wasn’t there to talk about using Riak, he was there to talk about how Riak was built using Erlang and OTP

There are eight distinct layers involved in reading/writing Riak data:

  • The Client Application using Riak
  • The client-side HTTP API or Protocol Buffers API that talks to the Riak cluster
  • The server-side Riak Client containing the combined backing code for both APIs
  • The Dynamo Model FSMs that interact with nodes using Dynamo style quorum behavior and conflict resolution
  • Riak Core provides the fundamental distribution of the system (not covered in the talk)
  • The VNode Master that runs on every physical node, and coordinates incoming interaction with individual VNodes
  • Individual VNodes (Virtual Nodes) which are treated as lightweight local abstractions over K/V storage
  • The swappable Storage Engine that persists data to disk

During his talk, Justin discussed each layer’s responsibilities and interactions with the layers above and below it.

Justin’s main point is that carefully managed complexity in the middle layers allows for simplicity at the edge layers. The top three layers present a simple key/value interface, and the bottom two layers implement a simple key/value store. The middle layers (FSMs, Riak Core, and VNode Master) work together to provide scalability, replication, etc. Erlang makes this possible, and was chosen because it provides a platform that evolves in useful and relatively-predictable ways (this is a good thing, a surprising evolution is bad).

Mnesia for the CAPper – Ulf Wiger

Ulf Wiger (CTO of Erlang Solutions) discussed where Mnesia might fit into the changing world of databases, given the new focus on “NoSQL” solutions. Ulf gave a quick introduction to ACID properties, Brewer’s CAP theorem, and the history of Mnesia, and then dove into a feature level description/comparison of Mnesia with other databases:

  • Deployed commercially for over 10 years
  • Comparable performance to current top performers clustered SQL space
  • Scalable to 50 nodes
  • Distributed transactions with loose time limits (in other words, appropriate for transactions across remote clusters)
  • Built-in support for sharding (fragments)
  • Incremental backup

The downsides are:

  • Erlang only interface
  • Tables limited to 2GB
  • Deadlock prevention scales poorly
  • Network partitions are not automatically handled, must recombine tables automatically

Ulf and others have done work to get around some of these limitations. Ulf showed code for an extension to Mnesia that automatically merges tables after they have split, using vector clocks.

Riak Search – Rusty Klophaus

I presented Riak Search, a distributed indexing and full-text search engine built on (and complementary to) Riak.

Part one covered the main reason for building Riak search: clients have built applications that eventually need to find data by value, not just by key. This is difficult, if not impossible, in a key/value store.

Part two described the shape of the final solution we set out to create. The goal of Riak Search is to support the Lucene interface, with Lucene syntax support and Solr endpoints, but with the operations story of Riak. This means that Riak Search will scale easily by adding new machines, and will continue to run after machine failure.

Part three was an introduction to Inverted Indexing, which is the heart of all search systems, as well as the difference between Document-Partitioning and Term-Partitioning, which forms the ongoing battle in the distributed search field. Part three continued with a deep-dive into parsing, planning, and executing the search query on Erlang.

Slides: http://www.slideshare.net/rklophaus/riak-search-erlang-factory-london-2010

Building a Scalable E-commerce Framework – Michael Nordström and Daniel Widgren

Michael Nordström and Daniel Widgren presented an Erlang-based e-commerce framework on behalf of their project team from Uppsala University (Christian Rennerskog, Shahzad Gul, Nicklas Nordenmark,
Manzoor Ahmad Mubashir, Mikael Nordström, Kim Honkaniemi, Tanvir Ahmad, Yujuan Zou, and Daniel Widgren) and their industrial partner, Klarna AB.

The application uses a “LERN stack” (Linux, Erlang, Riak, Nitrogen), to provide a reusable web shop that can be quickly set up by clients, customized via templates and themes, and extended via plugins to support different payment providers.

The project is currently going a rewrite to update to the latest versions of Riak and Nitrogen.

GitHub: http://github.com/mino4071/CookieCart-2.0

Twitter: @Cookie_Cart

Clash of the Titans: Erlang Clusters and Google App Engine – Panos Papadopoulos, Jon Vlachoyiannis, Nikos Kakavoulis

Panos, Jon, and Nikos took turns describing the technical evolution of their startup, SocialCaddy, and why they were forced to move away from the Google App Engine. SocialCaddy is a tool that mines your online profiles for important events and changes, and tells you about them. For example, if a friend gets engaged, SocialCaddy will tell you about it, and assist you in sending a congratulatory note.

Google App Engine imposes a 30 second limit on requests. As SocialCaddy processed larger and larger social graphs, they bumped into this limit, which made GAE unusable as a platform. In response, the team developed Erlust, which allows you to submit jobs (written in any language) to a cluster. An Erlang application coordinates the jobs, and each job should read from a queue, process messages, and write to another queue.

Using Open-Source Trifork QuickCheck to test Erjang – Kresten Krab Thorup

Kresten Krab Thorup (CTO of Trifork) stirred up dust when he originally announced his intention to build a version of Erlang that ran on the JVM. Since then, he has made astounding progress. Erjang turns Erlang .beam files into Java .class files, now supporting a broad enough feature set to run Mnesia over distributed Erlang. Kresten claimed performance matching (or at times exceeding) that of the Erlang VM.

Erjang is still a work in progress, there are many BIFs that still need to be ported, but if a prototype exists to prove viability, then this prototype was certainly a success. One slide showed the code for the spawn_link function reimplemented in Java in ~15 lines of simple Java code.

For the second half of his talk, Kresten showed off Triq (short for Trifork Quickcheck), a scaled-down, open-source QuickCheck inspired testing framework that he built in order to test Erjang. Triq supports basic generators (called domains), value picking, and shrinking. Kresten showed examples of using Triq to validate that Erjang performs binary operations with the exact same results as Erlang.

More information about Erjang here: http://wiki.github.com/krestenkrab/erjang/

Day 2 – June 11, 2010

Efene: A Programming Language for the Erlang VM – Mariano Guerra

Mariano Guerra presented Efene, a new language that is translated into Erlang source code. Efene is intended to help coax developers into the world of Erlang who might otherwise be intimidated by the Prolog-inspired syntax of Erlang. We’ve heard about a number of other projects compiling into Erlang byte-code (such as Reia and Lisp-Flavored Erlang), but Efene takes a different approach in that the language is parsed and translated using Leex and Yecc into standard Erlang code, which is then compiled as normal. By doing this, Mariano manages to leave most of the heavy lifting of optimizations to the existing Erlang compiler.

Efene actually supports two different syntax flavors, one with curly brackets, the other without, leading to a syntax that feels vaguely like Javascript or Python, respectively. (The syntax without curly brackets is called Ifene, for “Indented Efene”, and is otherwise identical to Efene.)

In some places, Efene syntax is a bit more verbose than Erlang. This is done to make the language more readable than Erlang. (“if” and “case” statements have more structure in Efene than Erlang.) In other places, Efene requires less typing, multi-claused function definitions don’t require you to repeat the function name, for example.

Code samples and more information: http://marianoguerra.com.ar/efene

Erlang in Embedded Systems – Gustav Simonsson, Henrik Nordh, Fredrik Andersson, Fabian Bergstrom, Niclas Axelsson and Christofer Ferm

Gustav, Henrik, Fredrik, Fabian, Niclas, and Christofer (Uppsala University), in cooperation with Erlang Solutions, worked on a project to shrink the Erlang VM (plus the Linux platform on which it runs) down to the smallest possible footprint for use on Gumstix and BeagleBoard hardware.

The team experimented with OpenEmbedded and Angstrom, using BusyBox, uClibc, and stripped .beam files to further decrease the footprint. During the presentation, they played a video showing how to install Erlang on a Gumstix single-board computer in 5 minutes using their work.

More information about Embedded Erlang here: http://embedded-erlang.org

Zotonic: Easy Content Management with Erlang’s Performance and Flexibility – Marc Worrell

Marc Worrell (WhatWebWhat) breaks CMSs into:

  • 1st Generation – Static text and images
  • 2nd Generation – Database- and template-driven systems (covers current CMS systems)
  • 3rd Generation – Highly interactive, real-time, personalized data exchanges and frameworks

Zotonic is aimed squarely at the third generation, Zotonic turns a CMS into a living, breathing thing, where modules on a page talk to each other and other sessions via comet, and the system can be easily extended, blurring the line between CMS and application framework.

This interactivity is what motivated Marc to write the system in Erlang; at one point he compared the data flowing through the system to a telephone exchange. Zotonic uses Webmachine, Mochiweb, ErlyDTL, and a number of other Erlang libraries, with data in PostgreSQL. (Marc also mentioned Nitrogen as an early inspiration for Zotonic, parts of Zotonic are based on Nitrogen code, though much has been rewritten.)

The data model is physically simple, with emergent functionality. A site is developed in terms of objects (called pages) interlinked with other objects. In other words, from a data perspective, adding an image to a web page is the same as linking from a page to a subpage, or tagging a page with an author. Mark gave a live demo of Zotonic’s ability to easily add and change menu structures, modify content, and add and re-order images. Almost everything can be customized using ErlyDTL templates. Very polished stuff.

Marc then introduced his goal of “Elastic Zotonic”, a Zotonic that can scale in a distributed, fault-tolerant, “buzzword-compliant” way, which will involve changes to the datastore and some of the architecture.

Marc is now working with Maximonster to develop an education-oriented social network on top of Zotonic.

More information: http://zotonic.com

Closing Session

Francesco (CSO, Erlang Solutions, Ltd.) thanked the sponsors, presenters, and audience. Frank then gave a big special thanks to Frank Knight and Joanna Włodarczyk, who both worked tirelessly to organize the conference and make everything go smoothly.

Final Thoughts

Erlang is gaining momentum in the industry as a platform that enables you to solve distributed, massively concurrent problems. People aren’t flocking directly to Erlang itself, they are instead flocking to projects built in Erlang, such as RabbitMQ, ejabberd, CouchDB, and of course, Riak. At the same time, other languages are adopting some of the key features that make Erlang special, including a message-passing architecture and lightweight threads.

SWAG Alert — Riak at Velocity

June 6, 2010

Velocity, the “Web Performance and Operations Conference” put on by O’Reilly, kicks off tomorrow and we here at Basho are excited. Why? Because Benjamin Black, acknowledged distributed systems expert, will be giving a 90 minute tutorial on Riak. The official name of the session is called “Riak: From Design to Deploy.” If you haven’t already read it, you can get the full description of the session here.

I just got a sneak peek at what Benjamin has planned and all I can say is that if you are within 100 miles of Santa Clara, CA tomorrow and not in this session, you will regret it.

And, what better to go with a hands on Riak tutorial than some good old fashioned SWAG? Here is a little offer to anyone attending tomorrow: post a write up of Benjamin’s session and I’ll send you a Riak SWAG pack. It doesn’t have to be a novel, just a few thoughts will do. Post them somewhere online for all the world to see and learn from, and I’ll take care of the rest.

Enjoy Velocity. We are looking forward to your reviews!

Mark Phillips
Community Manager

Riak Search

May 21, 2010

This post is going to start by explaining how in-the-trenches experience with key/value stores, like Riak, led to the creation of Riak Search. Then it will tell you why you care, what you’ll get out of Riak Search, and why it’s worth waiting for.

A bit of history

Few people know that Basho used to develop applications for deployment on Salesforce.com. We had big goals, and were thinking big to fill them, and part of that was choosing a data storage system that would give us what we needed not only to succeed and grow, but to survive — a confluence of pragmatism and ideal that embodied a bulletproof operations story, a path upward — resilience, reliability, and scalability, through the use of proven science.

So, that’s what we did: we developed and used what has grown to be, and what you know today, as Riak.

Idealism can’t get you everywhere, though. While we answered hard questions with link-walking and map/reduce, there was still the desire in the back of all of our heads: sometimes you just want to ask, “What emails were sent on May 21 that included the word ‘strategy’?” without having to figure out how to walk links from an organizational chart to mailboxes to mails, and then filter over the data there. It was a pragmatic desire: we just wanted a quick answer in order to decide whether or not to spend more time chasing a path. “Less yak-shaving, please.”

The Operations Story

Then we stopped making Salesforce.com apps, and started selling Riak. We quickly found the same set of desires. Operationally, Riak is a huge win. Pragmatically, something that does indexing and search in a similar operational manner is even bigger. Thus, Riak Search was born.

The operational story is, in a nutshell, this: when you add another node to your cluster, you add capacity and compute power. That’s it, you just add another box and “it just works.” Purposefully or not, eventually a node leaves the cluster, hardware fails, whatever: Riak deals with it. If the node comes back, it’s absorbed like it never left.

We insisted on these qualities for Riak, and have continued that insistence in Riak Search. We did it with all the familiar bits: consistent hashing, hinted handoff, replication, etc.

Why Riak Search?

Now, we’ll be the first to tell you that with Riak you can get pretty far using link-walking and map/reduce, with the understanding that you know what you are going to want ahead of time, and/or are willing to wait for it.

Riak Search answers questions that pop into your head; “find me all the blue dresses that are between $20 and $30 dollars,” “find me the document Bob referred to last week at the TPS procedures meeting,” “how can I delete all these emails from my aunt that have those stupid attachments?” “find me that comic strip with Bob,” etc.

It’s about making members of the sea of data in your key-value store findable. At a higher level, it’s about agility. The ability to answer questions you have about your business and your customers without having to consult a developer or dig through reference manuals and without your application developers having to reinvent the wheel with a very real possibility of doing it just right enough to assure you nothing will go wrong. It’s about a common indexing language.

Okay, now you know — thanks for bearing with us — let’s get to the technical bits.

Riak Search …

The system we have built …

  1. is an easily-scalable, fault-tolerant search and indexing system, adhering to the operational story you just read
  2. supports full-text indexing and search
  3. allows querying via the Lucene query syntax
  4. has Solr-compatible /select and /update web-services
  5. supports date and numeric indexing
  6. supports faceting
  7. automatically distributes indexes
  8. has an intermediate query language and integrated query planner
  9. supports scoring
  10. has integrated tokenizing, filtering and analysis (yes, you can
    use StandardAnalyzer!)

… and much more. Sounds pretty great, right?

If you want to know more about the internals and technical nitty gritty, check out the Riak Search presentation one of our own, Riak Search engineer John Muellerleile, gave at the San Francisco Erlang Factory this year.

So, why don’t you have it yet? The easy part.

There are still some stubs and hard-coded things in what we have. For instance, the full-text analyzer in use is just whitespace, case-normalization, and stop-word filtering. We intend to fully support the ability to specify other Lucene analyzers, including custom modules, but the code isn’t there yet.

There is also very little documentation. Without a little bit of handholding, even the brightest and most ambitious user could be forgiven for staring blankly, lost for even the first question to ask. We’re spreading the knowledge among our own team right now; that process will generate the artifacts needed for the next round of users to step in.

There are also many fiddly, finicky bits. These are largely relics of early iterations. Rather than having the interwebs be flooded with, “How do you stop this thing?” (as it was with Riak), we’re going to make things friendlier.

So, why don’t you have it yet? The not-so-easy part.

You’ve probably asked yourself, “What of integration of Riak and Riak Search?” We have many notes from discussions about how it could or should be done, as well as code showing how it can be done. But, we’re not completely satisfied with any of our implementations so far.

There are certainly no shortage of designs and ideas on how this could or should work, so we’re going to make a final pass at refining all of our ideas, given our current Riak Search system to play with, so that we can provide a solid, extensible system, instead of one that with many rough edges that would almost certainly be replaced immediately.

Furthering this sentiment is that we think that our existing map/reduce framework and the functionality and features provided by Riak Search are a true power combo when used together intelligently, than simply as alternatives, or at worse, at odds. As a result, we’re defining exactly how Riak Search indexing and querying should be threaded into Riak map/reduce processing to bring you a combination that is undoubtedly more than the sum of its parts.

We could tease you with specifics, like generating the set of bucket/key inputs to a map phase by performing a Riak Search query, or parameterizing Search phases with map results; though, for now, amidst protest both internally — we’re chomping at the bit to get this out into the world and into your hands  – and externally, as our favorite people continually request this exact set of technology and features, we’re going to implement the few extra details from our refined notes before forcing it on you all.

Hold on just a little longer. :)

-the Riak Search Team

Introducing the Riak Fast Track

May 4, 2010

Our Challenge

There is nothing easy about making software simple to learn and understand. Every potential user has different nuances to their learning styles, and this makes for a hard road to simple usage. This is especially true with Riak.

Internally at Basho, we are constantly addressing questions like, “How do we make a ‘distributed, Dynamo-inspired key/value store’ inviting and less daunting to first time users?” and “How do we lower the barrier to adoption and usage?” Though resources like the Riak Mailing List, the Riak Wiki, and Riak IRC channel are great, we kept asking ourselves, “What can we do to make it dead simple for those new to and interested in Riak to learn about it and how it works?”

Our answer (in part) is the Riak Fast Track.

What is the Riak Fast Track?

The Fast Track is an interactive module on the Riak Wiki that, through a combination of concise content and brief screencasts, will bring you up to speed on a) what Riak is, b) what its key features and benefits are, and c) how to use it.

As I stated above, the Fast Track is aimed at developers who may be new to Riak or those who may have heard about it in passing but haven’t spent too much time fiddling with it.

Is it exhaustive? No. Will you be an Riak expert after an hour? No. But, at the end of it, you should be able to tell your friends that you performed a JavaScript MapReduce query on historical stock data distributed over a three node Riak cluster on you local machine. If that’s not cool then I don’t know what is!

Your Challenge

We put a lot of time into making this, but there are undoubtedly some kinks that need to be worked out. And, regardless of how long we try to tweak and refine it, there will always be some small aspects and details that we aren’t going to get right. It is for that reason that we are appealing to you, the rapidly-growing Riak Community, to help us.

So, here is the challenge: Take 45 minutes and go through the Riak Fast Track. Then, when you’re done, take five minutes to write us an email and tell us what you thought about it. That’s it.

We are looking for answers to questions like:

  • Was it effective?
  • Did you learn anything?
  • What did we get right?
  • What did we get wrong?
  • What should we add/remove?

And, to sweeten the pot, we are going to send a “Riak Swag Pack” (contents of which are top secret) to everyone who sends us their review and thoughts on the Fast Track by the close of business on Tuesday (5/11) of next week. It doesn’t have to be anything extensive (though we love details). A simple, “I liked x, y, and z, but you could have done this better” would suffice. You can send your emails to mark@basho.com. I am looking forward to hearing from you!

So, without further ado, go forth and test out the Riak Fast Track.

We hope you’ll find it useful and we’re looking forward to your thoughts on how to make it better.

Best,

Mark Phillips

Toward A Consistent, Fact-based Comparison of Emerging Database Technologies

A Muddle That Slows Adoption

Basho released Riak as an open source project seven months ago and began commercial service shortly thereafter. As new entrants into the loose collection of database projects we observed two things:

  1. Widespread Confusion — the NoSQL rubric, and the decisions of most projects to self-identify under it, has created a false perception of overlap and similarity between projects differing not just technically but in approaches to licensing and distribution, leading to…
  2. Needless Competition — driving the confusion, many projects (us included, for sure) competed passionately (even acrimoniously) for attention as putative leaders of NoSQL, a fool’s errand as it turns out. One might as well claim leadership of all tools called wrenches.

The optimal use cases, architectures, and methods of software development differ so starkly even among superficially similar projects that to compete is to demonstrate a (likely pathological) lack of both user needs and self-knowledge.

This confusion and wasted energy — in other words, the market inefficiency — has been the fault of anyone who has laid claim to, or professed designs on, the NoSQL crown.

  1. Adoption suffers — Users either make decisions based on muddled information or, worse, do not make any decision whatsoever.
  2. Energy is wasted — At Basho we spent too much time from September to December answering the question posed without fail by investors and prospective users and clients: “Why will you ‘win’ NoSQL?”

With the vigor of fools, we answered this question, even though we rarely if ever encountered another project’s software in a “head-to-head” competition. (In fact, in the few cases where we have been pitted head-to-head against another project, we have won or lost so quickly that we cannot help but conclude the evaluation could have been avoided altogether.)

The investors and users merely behaved as rational (though often bemused) market actors. Having accepted the general claim that NoSQL was a monolithic category, both sought to make a bet.

Clearly what is needed is objective information presented in an environment of cooperation driven by mutual self-interest.

This information, shaped not by any one person’s necessarily imperfect understanding of the growing collection of data storage projects but rather by all the participants themselves, would go a long way to remedying the inefficiencies discussed above.

Demystification through data, not marketing claims

We have spoken to representatives of many of the emerging database projects. They have enthusiastically agreed to participate in a project to disclose data about each project. Disclosure will start with the following: a common benchmark framework and benchmarks/load tests modeling common use cases.

    1. A Common Benchmark Framework — For this collaboration to succeed, no single aspect will impact success or failure more than arriving at a common benchmark framework.

At Basho we have observed the proliferation of “microbenchmarks,” or benchmarks that do not reflect the conditions of a production environment. Benchmarks that use a small data set, that do not store to disk, that run for short (less than 12 hours) durations, do more to confuse the issue for end users than any single other factor. Participants will agree on benchmark methods, tools, applicability to use cases, and to make all benchmarks reproducible.

Compounding the confusion is when benchmarks are used for comparison of different use cases or was run on different hardware and yet compared head-to-head as if the tests or systems were identical. We will seek to help participants run equivalent on the various databases and we will not publish benchmark results that do not profile the hardware and configuration of the systems.

  1. Benchmarks That Support Use Cases — participants agree to benchmark their software under the conditions and with load tests reflective of use cases they commonly see in their user base or for which they think their software is best suited.
  2. Dissemination to third-parties — providing easy-to-find data to any party interested in posting results.
  3. Honestly exposing disagreement — Where agreement cannot be reached on any of the elements of the common benchmarking efforts, participants will respectfully expose the rationales for their divergent positions, thus still providing users with information upon which to base decisions.

There is more work to be done but all participants should begin to see the benefits: faster, better decisions by users.

We invite others to join, once we are underway. We, and our counterparts at other projects, believe this approach will go a long way to removing the inefficiencies hindering adoption of our software.

Tony and Justin

Enormous opportunity as relational database model begins to fail under the strain of Big Data and the Internet

CAMBRIDGE, MA – March 30Erlang Solutions, Ltd., the leading provider of Erlang services, and Basho Technologies, the makers of Riak, a high-availability data store written in Erlang, today announced they have entered into a multi-faceted partnership to deliver scalable, fault tolerant applications built on Riak, to the global market. Erlang OTP, an Ericsson open source technology, has found application in a new generation of fault tolerant use from companies like Facebook and Amazon.

“Erlang Solutions constantly seeks out new technologies and services to bring to its clients,” said Marcus Taylor, CEO of Erlang Solutions. “With Basho, we add not just the Riak software, but a partner to help us build an ecosystem of Erlang-based high-availability applications.”

The partnership has three major components: 1) both companies develop and support training and certification, 2) Erlang Solutions architects, load tests, and builds Riak-based applications for clients, and 3) Erlang Solutions provides Basho clients with training, systems design, prototyping, and development services.

“Erlang Solutions is globally recognized as the business thought leaders in the Erlang community,” said Earl Galleher, Basho Technologies Chairman and CEO. “Our current and future customers now have access to a new tier of professional services and we help Erlang Solutions push Erlang further into the mainstream market.”

With this partnership, Erlang Solutions now represents Basho Technologies in Europe for services and distribution. The teams will focus on high growth markets like mobile telephony, social media and e-commerce applications where current Relational Database Systems (RDBMS) solutions are struggling to keep up.

“Look at the explosive growth of SMS IM traffic,” said Francesco Cesarini, founder of Erlang Solutions, “and the cost to scale traditional infrastructure. Basho’s Riak helps clients contain these costs while increasing reliability. An ecosystem of high-availability solutions is forming and the relationship between Erlang Solutions and Basho Technologies will soon expand to include other partners and richer solutions.”

About Erlang Solutions Ltd

Founded in 1999, Erlang Solutions Ltd. www.erlang-solutions.com is an international company specialised in the Open Source language -Erlang and its middleware OTP. Erlang Solutions solves all your Erlang needs – Training, Certification, Consulting, Contracting, System Development, Support and Conferences. Erlang Solutions expert and certified consultants are the most experienced anywhere, with many having used Erlang since its early days. With offices in the UK, Sweden and Poland and clients on six continents, Erlang Solutions is available for short and long term opportunities world-wide.

About Basho Technologies

Basho Technologies, Inc., founded in January 2008 by a core group of software architects, engineers, and executive leadership from Akamai Technologies, Inc. (Nasdaq:AKAM – News), is headquartered in Cambridge, Massachusetts. Basho produces Riak, a distributed data store that combines extreme fault tolerance, rapid scalability, and ease of use. Designed from the ground up to work with applications that run on the Internet and mobile networks, Riak is particularly well-suited for users of cloud infrastructure such as Amazon’s AWS and Joyent’s Smart platform and is available in both an open source and a paid commercial version. Current customers of Riak include Comcast Corporation, MIG-CAN, and Mochi Media.

Media Contacts
Earl Galleher
CEO, Basho Technologies, Inc.
910.520.5466
earl@basho.com

The Craft Brewers of NoSQL

Just a few days ago, we did something a bit new at Basho. We posted the beginning of a public discussion to explore and document some differences between various NoSQL systems. Some people have attempted such comparisons before, but generally from an external observer’s point of view. When something like this comes from a producer of one of the systems in question it necessarily changes the tone.

If you weren’t really paying attention you could choose to see this as aggressive competition on our part, but the people that have chosen to engage with us have hopefully seen that it’s the opposite: an overt attempt at collaboration. While the initial comparisons were definitely not complete (for instance, in some cases they reflected the current self-documented state of systems instead of the current state of their code) they nonetheless very much had the desired effect.

That effect was to create productive conversation, resulting both in improvement of the comparison documents and in richer ongoing communication between the various projects out there. Our comparisons have already improved a great deal as a result of this and will continue to do so. I attribute this to the constructive attention that they have received from people deeply in the trenches with the various systems being discussed. That attention has also, I hope, given us a concrete context in which to strengthen our relationships with those people and projects.

Some of the attention we received was from people that are generally unhelpful; there are plenty of trolls on the internet who are more interested in throwing stones than in useful conversation. There’s not much point in wading into that kind of a mess as everyone gets dirty and nothing improves as a result. Luckily, we also received attention from people who actually build things. Those sorts of people tend to be much more interested in productive collaboration, and that was certainly the case this time. Though they’re by no means the only ones we’ve been happy to talk to, we can explicitly thank Greg Arnette, Jonathan Ellis, Benjamin Black, Mike Dirolf, Joe Stump, and Peter Neubauer for being among those spending their valuable time talking with us recently.

It’s easy to claim that any attempt to describe others that isn’t immediately perfect is just FUD, but our goal here is to help remove the fear, uncertainty, and doubt that people outside this fledgling community already have about all of this weird NoSQL stuff. By engaging each other in direct, honest, open, civil conversations we can all improve our knowledge as well as the words we use to describe each others’ work.

The people behind the various NoSQL systems today have a lot in common with the American craft brewers of the 1980s and 1990s. (Yes, I’m talking about beer.) You might casually imagine that people trying to sell similar products to the same market would be cutthroat competitors, but you’d be wrong. When you are disrupting a much larger established industry, aggression amongst peers isn’t a good route to success.

The friend who convinced me to try a Sam Adams in 1993 wasn’t striking a blow against Sierra Nevada or any of the other craft brewers at the time. In fact, he was helping all of those supposed “competitors” by opening up one more pair of eyes to the richness of choices that are available to all. People who enjoy good beer will happily talk about the differences between their brews all day, but in the end what matters is that when they walk into a bar they will look to see what choices they have at the tap instead of just ordering the same old Bud without a second thought.

Understanding that “beer” doesn’t always mean exactly the same identical beverage is the key realization, just as with NoSQL the most important thing people outside the community can realize is that not all data problems are shaped like a typical RDBMS.

Of course, any brewer will talk more about their own product more than anything else, but will also know that good conversations lead to improvements by all and the potential greater success of the entire community they exist in. Sometimes the way to start a good conversation is to talk about what you know best, with people that you know will have a different point of view than your own. From there, everyone’s knowledge, perspective, and understanding can improve.

At Basho we’re not just going to keep doing what we’ve already done in terms of communication. We’re going to keep finding new and better ways of communicating, and do it more often.

In addition to continuing to work with others on finding the right ways to publicly discuss our differences and similarities on fundamentals, we will also do so in specific areas such as performance testing, reliability under duress, and more. This will remain tricky, because it is easy for people to get confused by superficial issues and distracted from the more interesting ones — and opinions will vary on which are which. In discussing those opinions, we will all become more capable practitioners and advocates of the craft that binds us together.

Ruffling a few feathers is a low cost to pay, if better conversations occur. This is especially true if the people truly creating value by building systems learn how to work better together in the process.

Justin

Riak in Production – A Distributed Event Registration System Written in Erlang

March 20, 2010

Riak, at its core, is an open source project. So, we love the opportunity to hear from our users and find out where and how they are using Riak in their applications. It is for that reason that we were excited to hear from Chris Villalobos. He recently put a Distributed Event Registration application into production at his church in Gainesville, Florida, and after hearing a bit about it, we asked him to write a short piece about it for the Basho Blog.

Use Case and Prototyping

As a way of going paperless at our church, I was tasked with creating an event registration system that was accessible via touchscreen kiosk, SMS, and our website, to be used by members to sign up for various events. As I was wanting to learn a new language and had dabbled in Erlang (specifically Mochiweb) for another small application, I decided that I was going to try and do the whole thing in Erlang. But how to do it, and on a two month time line, was quite the challenge.

The initial idea was to have each kiosk independently hold pieces of the database, so that in the event something happened to a server or a kiosk, the data would still be available. Also, I wanted to use the fault-tolerance and distributed processing of Erlang to help make sure that the various frontends would be constantly running and online. And, as I wanted to stay as close to pure Erlang as possible, I decided early against a SQL database. I tried Mnesia but I wasn’t happy with the results. Using QLC as an interface, interesting issues arose when I took down a master node. (I was also facing a time issue so playing with it extensively wasn’t really an option.)

It just so happened that Basho released Riak 0.8 the morning I got fed up with it. So I thought about how I could use a key/value store. I liked how the Riak API made it simple to get data in and out of the database, how I could use map-reduce functionality to create any reports I needed and how the distribution of data worked out. Most importantly, no matter what nodes I knocked out while the cluster was running, everything just continued to click. I found my datastore.

During the initial protoyping stages for the kiosk, I envisioned a simple key/value store using a data model that looked something like this:

“`erlang
[
{key1, {Title, Icon, Background Image, Description, [signup_options]}},
{key2, {…}}
]
“`

This design would enable me to present the user with a list of options when the kiosk was started up. I found that by using Riak, this was simple to implement. I also enjoyed that Riak was great at getting out of the way. I didn’t have to think about how it was going to work, I just knew that it would. ( The primary issue I kept running into when I thought about future problems was sibling entries. If two users on two kiosks submit information at the same time for the same entry, (potentially an issue as the number of kiosks grow), then that would result in sibling entries because of the way user data is stored:

“`erlang
<>, <>, [user data]
“`

But, by checking for siblings when the reports are generated, this problem became a non-issue.)

High Level Architecture

The kiosk is live and running now with very few kinks (mostly hardware) and everything is in pure Erlang. At a high level, the application architecture looks like this:

Each Touchscreen Kiosk:

  • wxErlang
  • Riak node

Web-Based Management/SMS Processing Layer:

  • Nitrogen Framework speaking to Riak for Kiosk Configuration/Reporting
  • Nitrogen/Mochiweb processing SMS messages from SMS aggregator

Periodic Email Sender:

  • Vagabond’s gen_smtp client on a eternal receive after 24 hours send email-loop.

In Production

Currently, we are running four Riak nodes (writing out to the Filesystem backend) outside of the three Kiosks themselves. I also have various Riak nodes on my random linux servers because I can use the CPU cycles on my other nodes to distribute MapReduce functions and store information in a redundant fashion.

By using Riak, I was able to keep the database lean and mean with creative uses of keys. Every asset for the kiosk is stored within Riak, including images. These are pulled only whenever a kiosk is started up or whenever an asset is created, updated, or removed (using message passing). If an image isn’t present on a local kiosk, it is pulled from the database and then stored locally. Also, all images and panels (such as the on-screen keyboard) are stored in memory to make things faster.

All SMS messages are stored within an SMS bucket. Every 24 hours all the buckets are checked with a “mapred_bucket” to see if there are any new messages since the last time the function ran. These results are formatted within the MapReduce function and emailed out using the gen_smtp client. As assets are removed from the system, the current data is stored within a serialized text file and then removed the database.

As I bring more kiosks into operation, the distributed map-reduce feature is becoming more valuable. Since I typically run reports during off hours, the kiosks aren’t overloaded by the extra processing power. So far I have been able to roll out a new kiosk within 2 hours of receiving the hardware. Most of this time is spent doing the installation and configuration of the touchscreen. Also, the system is becoming more and more vital to how we are interfacing with people, giving members multiple ways of contacting us at their convenience. I am planning on expanding how I use the system, especially with code-distribution. For example, with the Innostore interface, I might store the beam files inside and send them to the kiosks using Erlang commands. (Version Control inside Riak, anyone?)

What’s Next?

I have ambitious plans for the system, especially on the kiosk side. As this is a very beta version of the software, it is only currently in production in our little community. That said, I hope to open source it and put it on github/bitbucket/etc. as soon as I pretty up all the interfaces.

I’d say probably the best thing about this whole project is getting to know the people inside the Erlang community, especially the Basho people and the #erlang regulars on IRC. Anytime I had a problem, someone was there willing to work through it with me. Since I am essentially new to Erlang, it really helped to have a strong sense of community. Thank you to all the folks at Basho for giving me a platform to show what Erlang can do in everyday, out of the way places.

Chris Villalobos

The Release of the Riak Wiki and the Fourth Basho Podcast

March 12, 2010

We are moving at warp speed here at Basho and today we are releasing what we feel is a very important enhancement to Riak: a wiki.

You can find it here:

http://docs.basho.com

Documentation and resources are a main priority right now for Basho, and a well maintained and up-to-date wiki is something we see as critical. Our goal is to make Riak simple and intuitive to download, build, program against, and build apps on. So, you should expect a lot more from us in this regard. Also, we still have much to add to the Riak Wiki, so if you think we are missing a resource or some documentation that makes Riak easier to use and learn about, please tell us.

Secondly, we had the chance to record the fourth installment of the Basho Riak podcast (below), and it was a good one. We hooked up with Tim Anglade, CTO of GemKitty and a growing authority on the NoSQL space. On the heels of his presentation at NoSQL Live from Boston, we picked his brain a bit about where he thinks the industry is going and what needs to change for the current iteration of NoSQL to go from being a fad and curiosity to a full fledged industry.

According to Tim, “We have an image problem right now with NoSQL as a brand,” and “NoSQL is over-hyped and the projects behind it are under-hyped.”

We also took a few minutes to talk about the Riak 0.9.1 release. Highlights include binary builds, as well as several new client libraries that expose all of Riak’s advanced features.

In short, if you are at all interested in the future of the NoSQL space, you’re not going to want to miss this.

Lastly, if you haven’t already done so, go download the latest version of Riak.

Enjoy!

Mark

Calling all Rubyists – Ripple has Arrived!

February 11, 2010

The Basho Dev. Team has been very excited about working with the Ruby community for some time. The only problem was we were heads down on so many other projects that it was hard to make any progress. But, even with all that work on our plate, we were committed to showing some love to Rubyists and their frameworks.

Enter Sean Cribbs. As Sean details in his latest blog post, Basho and the stellar team at Sonian made it possible for him to hack on Ripple, a one-of-a-kind client library and object mapper for Riak. The full feature set for Ripple can be found on Sean’s blog, but highlights include a DataMapper-like API, an easy builder-style interface to Map/Reduce, and near-term integration with Rails 3.

And, in case you need any convincing that you should consider Riak as the primary datastore for your next Rails app, check out Sean’s earlier post, “Why Riak should power your next Rails app.”

So, if you’ve read enough and want to get your hands on Ripple, go check it out on GitHub.

If you don’t have Riak downloaded and built yet, get on it.

Lastly, you are going to be seeing a lot more Riak in your Ruby. So stay tuned because we have some big plans.

Best,

Mark