Tag Archives: Basho

Introducing the Riak Fast Track

May 4, 2010

Our Challenge

There is nothing easy about making software simple to learn and understand. Every potential user has different nuances to their learning styles, and this makes for a hard road to simple usage. This is especially true with Riak.

Internally at Basho, we are constantly addressing questions like, “How do we make a ‘distributed, Dynamo-inspired key/value store’ inviting and less daunting to first time users?” and “How do we lower the barrier to adoption and usage?” Though resources like the Riak Mailing List, the Riak Wiki, and Riak IRC channel are great, we kept asking ourselves, “What can we do to make it dead simple for those new to and interested in Riak to learn about it and how it works?”

Our answer (in part) is the Riak Fast Track.

What is the Riak Fast Track?

The Fast Track is an interactive module on the Riak Wiki that, through a combination of concise content and brief screencasts, will bring you up to speed on a) what Riak is, b) what its key features and benefits are, and c) how to use it.

As I stated above, the Fast Track is aimed at developers who may be new to Riak or those who may have heard about it in passing but haven’t spent too much time fiddling with it.

Is it exhaustive? No. Will you be an Riak expert after an hour? No. But, at the end of it, you should be able to tell your friends that you performed a JavaScript MapReduce query on historical stock data distributed over a three node Riak cluster on you local machine. If that’s not cool then I don’t know what is!

Your Challenge

We put a lot of time into making this, but there are undoubtedly some kinks that need to be worked out. And, regardless of how long we try to tweak and refine it, there will always be some small aspects and details that we aren’t going to get right. It is for that reason that we are appealing to you, the rapidly-growing Riak Community, to help us.

So, here is the challenge: Take 45 minutes and go through the Riak Fast Track. Then, when you’re done, take five minutes to write us an email and tell us what you thought about it. That’s it.

We are looking for answers to questions like:

  • Was it effective?
  • Did you learn anything?
  • What did we get right?
  • What did we get wrong?
  • What should we add/remove?

And, to sweeten the pot, we are going to send a “Riak Swag Pack” (contents of which are top secret) to everyone who sends us their review and thoughts on the Fast Track by the close of business on Tuesday (5/11) of next week. It doesn’t have to be anything extensive (though we love details). A simple, “I liked x, y, and z, but you could have done this better” would suffice. You can send your emails to mark@basho.com. I am looking forward to hearing from you!

So, without further ado, go forth and test out the Riak Fast Track.

We hope you’ll find it useful and we’re looking forward to your thoughts on how to make it better.

Best,

Mark Phillips

Toward A Consistent, Fact-based Comparison of Emerging Database Technologies

A Muddle That Slows Adoption

Basho released Riak as an open source project seven months ago and began commercial service shortly thereafter. As new entrants into the loose collection of database projects we observed two things:

  1. Widespread Confusion — the NoSQL rubric, and the decisions of most projects to self-identify under it, has created a false perception of overlap and similarity between projects differing not just technically but in approaches to licensing and distribution, leading to…
  2. Needless Competition — driving the confusion, many projects (us included, for sure) competed passionately (even acrimoniously) for attention as putative leaders of NoSQL, a fool’s errand as it turns out. One might as well claim leadership of all tools called wrenches.

The optimal use cases, architectures, and methods of software development differ so starkly even among superficially similar projects that to compete is to demonstrate a (likely pathological) lack of both user needs and self-knowledge.

This confusion and wasted energy — in other words, the market inefficiency — has been the fault of anyone who has laid claim to, or professed designs on, the NoSQL crown.

  1. Adoption suffers — Users either make decisions based on muddled information or, worse, do not make any decision whatsoever.
  2. Energy is wasted — At Basho we spent too much time from September to December answering the question posed without fail by investors and prospective users and clients: “Why will you ‘win’ NoSQL?”

With the vigor of fools, we answered this question, even though we rarely if ever encountered another project’s software in a “head-to-head” competition. (In fact, in the few cases where we have been pitted head-to-head against another project, we have won or lost so quickly that we cannot help but conclude the evaluation could have been avoided altogether.)

The investors and users merely behaved as rational (though often bemused) market actors. Having accepted the general claim that NoSQL was a monolithic category, both sought to make a bet.

Clearly what is needed is objective information presented in an environment of cooperation driven by mutual self-interest.

This information, shaped not by any one person’s necessarily imperfect understanding of the growing collection of data storage projects but rather by all the participants themselves, would go a long way to remedying the inefficiencies discussed above.

Demystification through data, not marketing claims

We have spoken to representatives of many of the emerging database projects. They have enthusiastically agreed to participate in a project to disclose data about each project. Disclosure will start with the following: a common benchmark framework and benchmarks/load tests modeling common use cases.

    1. A Common Benchmark Framework — For this collaboration to succeed, no single aspect will impact success or failure more than arriving at a common benchmark framework.

At Basho we have observed the proliferation of “microbenchmarks,” or benchmarks that do not reflect the conditions of a production environment. Benchmarks that use a small data set, that do not store to disk, that run for short (less than 12 hours) durations, do more to confuse the issue for end users than any single other factor. Participants will agree on benchmark methods, tools, applicability to use cases, and to make all benchmarks reproducible.

Compounding the confusion is when benchmarks are used for comparison of different use cases or was run on different hardware and yet compared head-to-head as if the tests or systems were identical. We will seek to help participants run equivalent on the various databases and we will not publish benchmark results that do not profile the hardware and configuration of the systems.

  1. Benchmarks That Support Use Cases — participants agree to benchmark their software under the conditions and with load tests reflective of use cases they commonly see in their user base or for which they think their software is best suited.
  2. Dissemination to third-parties — providing easy-to-find data to any party interested in posting results.
  3. Honestly exposing disagreement — Where agreement cannot be reached on any of the elements of the common benchmarking efforts, participants will respectfully expose the rationales for their divergent positions, thus still providing users with information upon which to base decisions.

There is more work to be done but all participants should begin to see the benefits: faster, better decisions by users.

We invite others to join, once we are underway. We, and our counterparts at other projects, believe this approach will go a long way to removing the inefficiencies hindering adoption of our software.

Tony and Justin

Enormous opportunity as relational database model begins to fail under the strain of Big Data and the Internet

CAMBRIDGE, MA – March 30Erlang Solutions, Ltd., the leading provider of Erlang services, and Basho Technologies, the makers of Riak, a high-availability data store written in Erlang, today announced they have entered into a multi-faceted partnership to deliver scalable, fault tolerant applications built on Riak, to the global market. Erlang OTP, an Ericsson open source technology, has found application in a new generation of fault tolerant use from companies like Facebook and Amazon.

“Erlang Solutions constantly seeks out new technologies and services to bring to its clients,” said Marcus Taylor, CEO of Erlang Solutions. “With Basho, we add not just the Riak software, but a partner to help us build an ecosystem of Erlang-based high-availability applications.”

The partnership has three major components: 1) both companies develop and support training and certification, 2) Erlang Solutions architects, load tests, and builds Riak-based applications for clients, and 3) Erlang Solutions provides Basho clients with training, systems design, prototyping, and development services.

“Erlang Solutions is globally recognized as the business thought leaders in the Erlang community,” said Earl Galleher, Basho Technologies Chairman and CEO. “Our current and future customers now have access to a new tier of professional services and we help Erlang Solutions push Erlang further into the mainstream market.”

With this partnership, Erlang Solutions now represents Basho Technologies in Europe for services and distribution. The teams will focus on high growth markets like mobile telephony, social media and e-commerce applications where current Relational Database Systems (RDBMS) solutions are struggling to keep up.

“Look at the explosive growth of SMS IM traffic,” said Francesco Cesarini, founder of Erlang Solutions, “and the cost to scale traditional infrastructure. Basho’s Riak helps clients contain these costs while increasing reliability. An ecosystem of high-availability solutions is forming and the relationship between Erlang Solutions and Basho Technologies will soon expand to include other partners and richer solutions.”

About Erlang Solutions Ltd

Founded in 1999, Erlang Solutions Ltd. www.erlang-solutions.com is an international company specialised in the Open Source language -Erlang and its middleware OTP. Erlang Solutions solves all your Erlang needs – Training, Certification, Consulting, Contracting, System Development, Support and Conferences. Erlang Solutions expert and certified consultants are the most experienced anywhere, with many having used Erlang since its early days. With offices in the UK, Sweden and Poland and clients on six continents, Erlang Solutions is available for short and long term opportunities world-wide.

About Basho Technologies

Basho Technologies, Inc., founded in January 2008 by a core group of software architects, engineers, and executive leadership from Akamai Technologies, Inc. (Nasdaq:AKAM – News), is headquartered in Cambridge, Massachusetts. Basho produces Riak, a distributed data store that combines extreme fault tolerance, rapid scalability, and ease of use. Designed from the ground up to work with applications that run on the Internet and mobile networks, Riak is particularly well-suited for users of cloud infrastructure such as Amazon’s AWS and Joyent’s Smart platform and is available in both an open source and a paid commercial version. Current customers of Riak include Comcast Corporation, MIG-CAN, and Mochi Media.

Media Contacts
Earl Galleher
CEO, Basho Technologies, Inc.
910.520.5466
earl@basho.com

The Craft Brewers of NoSQL

Just a few days ago, we did something a bit new at Basho. We posted the beginning of a public discussion to explore and document some differences between various NoSQL systems. Some people have attempted such comparisons before, but generally from an external observer’s point of view. When something like this comes from a producer of one of the systems in question it necessarily changes the tone.

If you weren’t really paying attention you could choose to see this as aggressive competition on our part, but the people that have chosen to engage with us have hopefully seen that it’s the opposite: an overt attempt at collaboration. While the initial comparisons were definitely not complete (for instance, in some cases they reflected the current self-documented state of systems instead of the current state of their code) they nonetheless very much had the desired effect.

That effect was to create productive conversation, resulting both in improvement of the comparison documents and in richer ongoing communication between the various projects out there. Our comparisons have already improved a great deal as a result of this and will continue to do so. I attribute this to the constructive attention that they have received from people deeply in the trenches with the various systems being discussed. That attention has also, I hope, given us a concrete context in which to strengthen our relationships with those people and projects.

Some of the attention we received was from people that are generally unhelpful; there are plenty of trolls on the internet who are more interested in throwing stones than in useful conversation. There’s not much point in wading into that kind of a mess as everyone gets dirty and nothing improves as a result. Luckily, we also received attention from people who actually build things. Those sorts of people tend to be much more interested in productive collaboration, and that was certainly the case this time. Though they’re by no means the only ones we’ve been happy to talk to, we can explicitly thank Greg Arnette, Jonathan Ellis, Benjamin Black, Mike Dirolf, Joe Stump, and Peter Neubauer for being among those spending their valuable time talking with us recently.

It’s easy to claim that any attempt to describe others that isn’t immediately perfect is just FUD, but our goal here is to help remove the fear, uncertainty, and doubt that people outside this fledgling community already have about all of this weird NoSQL stuff. By engaging each other in direct, honest, open, civil conversations we can all improve our knowledge as well as the words we use to describe each others’ work.

The people behind the various NoSQL systems today have a lot in common with the American craft brewers of the 1980s and 1990s. (Yes, I’m talking about beer.) You might casually imagine that people trying to sell similar products to the same market would be cutthroat competitors, but you’d be wrong. When you are disrupting a much larger established industry, aggression amongst peers isn’t a good route to success.

The friend who convinced me to try a Sam Adams in 1993 wasn’t striking a blow against Sierra Nevada or any of the other craft brewers at the time. In fact, he was helping all of those supposed “competitors” by opening up one more pair of eyes to the richness of choices that are available to all. People who enjoy good beer will happily talk about the differences between their brews all day, but in the end what matters is that when they walk into a bar they will look to see what choices they have at the tap instead of just ordering the same old Bud without a second thought.

Understanding that “beer” doesn’t always mean exactly the same identical beverage is the key realization, just as with NoSQL the most important thing people outside the community can realize is that not all data problems are shaped like a typical RDBMS.

Of course, any brewer will talk more about their own product more than anything else, but will also know that good conversations lead to improvements by all and the potential greater success of the entire community they exist in. Sometimes the way to start a good conversation is to talk about what you know best, with people that you know will have a different point of view than your own. From there, everyone’s knowledge, perspective, and understanding can improve.

At Basho we’re not just going to keep doing what we’ve already done in terms of communication. We’re going to keep finding new and better ways of communicating, and do it more often.

In addition to continuing to work with others on finding the right ways to publicly discuss our differences and similarities on fundamentals, we will also do so in specific areas such as performance testing, reliability under duress, and more. This will remain tricky, because it is easy for people to get confused by superficial issues and distracted from the more interesting ones — and opinions will vary on which are which. In discussing those opinions, we will all become more capable practitioners and advocates of the craft that binds us together.

Ruffling a few feathers is a low cost to pay, if better conversations occur. This is especially true if the people truly creating value by building systems learn how to work better together in the process.

Justin

Riak in Production – A Distributed Event Registration System Written in Erlang

March 20, 2010

Riak, at its core, is an open source project. So, we love the opportunity to hear from our users and find out where and how they are using Riak in their applications. It is for that reason that we were excited to hear from Chris Villalobos. He recently put a Distributed Event Registration application into production at his church in Gainesville, Florida, and after hearing a bit about it, we asked him to write a short piece about it for the Basho Blog.

Use Case and Prototyping

As a way of going paperless at our church, I was tasked with creating an event registration system that was accessible via touchscreen kiosk, SMS, and our website, to be used by members to sign up for various events. As I was wanting to learn a new language and had dabbled in Erlang (specifically Mochiweb) for another small application, I decided that I was going to try and do the whole thing in Erlang. But how to do it, and on a two month time line, was quite the challenge.

The initial idea was to have each kiosk independently hold pieces of the database, so that in the event something happened to a server or a kiosk, the data would still be available. Also, I wanted to use the fault-tolerance and distributed processing of Erlang to help make sure that the various frontends would be constantly running and online. And, as I wanted to stay as close to pure Erlang as possible, I decided early against a SQL database. I tried Mnesia but I wasn’t happy with the results. Using QLC as an interface, interesting issues arose when I took down a master node. (I was also facing a time issue so playing with it extensively wasn’t really an option.)

It just so happened that Basho released Riak 0.8 the morning I got fed up with it. So I thought about how I could use a key/value store. I liked how the Riak API made it simple to get data in and out of the database, how I could use map-reduce functionality to create any reports I needed and how the distribution of data worked out. Most importantly, no matter what nodes I knocked out while the cluster was running, everything just continued to click. I found my datastore.

During the initial protoyping stages for the kiosk, I envisioned a simple key/value store using a data model that looked something like this:

“`erlang
[
{key1, {Title, Icon, Background Image, Description, [signup_options]}},
{key2, {…}}
]
“`

This design would enable me to present the user with a list of options when the kiosk was started up. I found that by using Riak, this was simple to implement. I also enjoyed that Riak was great at getting out of the way. I didn’t have to think about how it was going to work, I just knew that it would. ( The primary issue I kept running into when I thought about future problems was sibling entries. If two users on two kiosks submit information at the same time for the same entry, (potentially an issue as the number of kiosks grow), then that would result in sibling entries because of the way user data is stored:

“`erlang
<>, <>, [user data]
“`

But, by checking for siblings when the reports are generated, this problem became a non-issue.)

High Level Architecture

The kiosk is live and running now with very few kinks (mostly hardware) and everything is in pure Erlang. At a high level, the application architecture looks like this:

Each Touchscreen Kiosk:

  • wxErlang
  • Riak node

Web-Based Management/SMS Processing Layer:

  • Nitrogen Framework speaking to Riak for Kiosk Configuration/Reporting
  • Nitrogen/Mochiweb processing SMS messages from SMS aggregator

Periodic Email Sender:

  • Vagabond’s gen_smtp client on a eternal receive after 24 hours send email-loop.

In Production

Currently, we are running four Riak nodes (writing out to the Filesystem backend) outside of the three Kiosks themselves. I also have various Riak nodes on my random linux servers because I can use the CPU cycles on my other nodes to distribute MapReduce functions and store information in a redundant fashion.

By using Riak, I was able to keep the database lean and mean with creative uses of keys. Every asset for the kiosk is stored within Riak, including images. These are pulled only whenever a kiosk is started up or whenever an asset is created, updated, or removed (using message passing). If an image isn’t present on a local kiosk, it is pulled from the database and then stored locally. Also, all images and panels (such as the on-screen keyboard) are stored in memory to make things faster.

All SMS messages are stored within an SMS bucket. Every 24 hours all the buckets are checked with a “mapred_bucket” to see if there are any new messages since the last time the function ran. These results are formatted within the MapReduce function and emailed out using the gen_smtp client. As assets are removed from the system, the current data is stored within a serialized text file and then removed the database.

As I bring more kiosks into operation, the distributed map-reduce feature is becoming more valuable. Since I typically run reports during off hours, the kiosks aren’t overloaded by the extra processing power. So far I have been able to roll out a new kiosk within 2 hours of receiving the hardware. Most of this time is spent doing the installation and configuration of the touchscreen. Also, the system is becoming more and more vital to how we are interfacing with people, giving members multiple ways of contacting us at their convenience. I am planning on expanding how I use the system, especially with code-distribution. For example, with the Innostore interface, I might store the beam files inside and send them to the kiosks using Erlang commands. (Version Control inside Riak, anyone?)

What’s Next?

I have ambitious plans for the system, especially on the kiosk side. As this is a very beta version of the software, it is only currently in production in our little community. That said, I hope to open source it and put it on github/bitbucket/etc. as soon as I pretty up all the interfaces.

I’d say probably the best thing about this whole project is getting to know the people inside the Erlang community, especially the Basho people and the #erlang regulars on IRC. Anytime I had a problem, someone was there willing to work through it with me. Since I am essentially new to Erlang, it really helped to have a strong sense of community. Thank you to all the folks at Basho for giving me a platform to show what Erlang can do in everyday, out of the way places.

Chris Villalobos

The Release of the Riak Wiki and the Fourth Basho Podcast

March 12, 2010

We are moving at warp speed here at Basho and today we are releasing what we feel is a very important enhancement to Riak: a wiki.

You can find it here:

http://docs.basho.com

Documentation and resources are a main priority right now for Basho, and a well maintained and up-to-date wiki is something we see as critical. Our goal is to make Riak simple and intuitive to download, build, program against, and build apps on. So, you should expect a lot more from us in this regard. Also, we still have much to add to the Riak Wiki, so if you think we are missing a resource or some documentation that makes Riak easier to use and learn about, please tell us.

Secondly, we had the chance to record the fourth installment of the Basho Riak podcast (below), and it was a good one. We hooked up with Tim Anglade, CTO of GemKitty and a growing authority on the NoSQL space. On the heels of his presentation at NoSQL Live from Boston, we picked his brain a bit about where he thinks the industry is going and what needs to change for the current iteration of NoSQL to go from being a fad and curiosity to a full fledged industry.

According to Tim, “We have an image problem right now with NoSQL as a brand,” and “NoSQL is over-hyped and the projects behind it are under-hyped.”

We also took a few minutes to talk about the Riak 0.9.1 release. Highlights include binary builds, as well as several new client libraries that expose all of Riak’s advanced features.

In short, if you are at all interested in the future of the NoSQL space, you’re not going to want to miss this.

Lastly, if you haven’t already done so, go download the latest version of Riak.

Enjoy!

Mark

Calling all Rubyists – Ripple has Arrived!

February 11, 2010

The Basho Dev. Team has been very excited about working with the Ruby community for some time. The only problem was we were heads down on so many other projects that it was hard to make any progress. But, even with all that work on our plate, we were committed to showing some love to Rubyists and their frameworks.

Enter Sean Cribbs. As Sean details in his latest blog post, Basho and the stellar team at Sonian made it possible for him to hack on Ripple, a one-of-a-kind client library and object mapper for Riak. The full feature set for Ripple can be found on Sean’s blog, but highlights include a DataMapper-like API, an easy builder-style interface to Map/Reduce, and near-term integration with Rails 3.

And, in case you need any convincing that you should consider Riak as the primary datastore for your next Rails app, check out Sean’s earlier post, “Why Riak should power your next Rails app.”

So, if you’ve read enough and want to get your hands on Ripple, go check it out on GitHub.

If you don’t have Riak downloaded and built yet, get on it.

Lastly, you are going to be seeing a lot more Riak in your Ruby. So stay tuned because we have some big plans.

Best,

Mark

Basho Podcast Three – An Introduction To Innostore

February 2, 2010

You may remember that Basho recently open-sourced Innostore, our standalone Erlang application that provides a simple interface to embedded InnoDB…

In this podcast, Dave “Dizzy” Smith and Justin Sheehy discuss the release of Innostore, why we built it, how we use it in Riak, and why it might be useful for other Erlang projects. The discussion focuses on the stability and predictability of InnoDB, especially under load and as compared with other storage backends like DETS.

And of course, go download Innostore when you are done with the podcast.

Enjoy!

Mark


If you are having problems getting the podcast to play, click here to play in new window or right click to download the podcast.

UC Berkeley Professor Dr. Eric A. Brewer Joins Board of Directors of Basho Technologies, Inc.

Dr. Brewer takes active consulting role guiding Basho’s product roadmap and R&D projects.

CAMBRIDGE, MA – January 27, 2010Basho Technologies, Inc. today announced Dr. Eric Brewer, Professor of Computer Science at University of California, Berkeley and formerly founder and Chief Scientist of Inktomi Corporation, a leading provider of scalable search (sold to Yahoo! in 2003. NASDAQ:YHOO), is joining the Basho Board of Directors and he will serve as an active advisor on Basho’s product roadmap. Dr. Brewer begins in his dual roles effective immediately.

Dr. Brewer, globally recognized for his work on scalable, distributed systems, promoted the CAP Theorem, a seminal idea that fueled the emergence of a new class of distributed systems, including Basho’s own distributed data store, Riak.

“The impact of the cloud is still in its early stages – it will affect all aspects of life over the next ten years,” said Dr. Brewer. “New cloud-centric computing and database systems are at the core of this revolution and Basho’s team and their technology are emerging as one of the leaders of this space.”

Dr. Brewer will take an active role in shaping the Riak OS and Riak EnterpriseDS product roadmaps and he will advise the development team on long-term research and development projects.

“We are extremely fortunate and honored that Dr. Brewer has chosen Basho to invest his considerable knowledge and expertise,” said Earl Galleher, Chairman and CEO of Basho. “His involvement will make a true difference to our customers, our open source users and to our employees. To be guided by the person who has had the biggest impact on the NoSQL movement immediately places Basho in a leadership position among all NoSQL companies. His involvement with Basho goes a long way to validate the entire NoSQL space.”

As a researcher, Dr. Brewer has led projects on scalable servers, search engines, network infrastructure, sensor networks, and security. In 2000, he founded the Federal Search Foundation, an organization focused on improving consumer access to government information. Working with President Clinton, Dr. Brewer helped to create USA.gov, the official portal of the Federal government, which launched in September 2000.

Dr. Brewer has a MS and Ph.D. in EECS from the Massachusetts Institute of Technology, and a BS in EECS from UC Berkeley. He was named a “Global Leader for Tomorrow” by the World Economic Forum, by the Industry Standard as the “most influential person on the architecture of the Internet”, and in 2007 was elected to the National Academy of Engineering.

About Basho Technologies

Basho Technologies, Inc., founded in January 2008 by a core group of software architects, engineers, and executive leadership from Akamai Technologies, Inc. (Nasdaq:AKAM – News), is headquartered in Cambridge, Massachusetts. Basho produces Riak, a distributed data store that combines extreme fault tolerance, rapid scalability, and ease of use. Designed from the ground up to work with applications that run on the Internet and mobile networks, Riak is particularly well-suited for users of cloud infrastructure such as Amazon’s AWS and Joyent’s Smart platform and is available in both an open source and a paid commercial version. Current customers of Riak include Comcast Corporation, MIG-CAN, and Mochi Media.

Media Contacts
Earl Galleher
CEO, Basho Technologies, Inc.
910.520.5466
earl@basho.com

Innostore — connecting Erlang to InnoDB

January 26, 2010

Riak has pluggable storage engines, and so we’re always on the lookout for better ways that users can store their data locally. Recent experiences with some Basho customers managing some large datasets led us to believe that InnoDB might work out very well for them.

To answer that question and fill that need, Innostore was written. It is a standalone Erlang application that provides a simple interface to Embedded InnoDB. So far its performance has been quite good, though InnoDB (with or without the Innostore API) is highly dependent on tuning the local configuration to match the local hardware. Luckily, Dizzy – the author of Innostore — has some heavy-duty experience doing that kind of tuning and as a result we’ve been able to help people meet their performance goals using Innostore.

-Justin