November 27, 2013
Join Basho and 451 Research on Tuesday, December 10th at 10am PT for a live webinar, “Beyond NoSQL – Distributed Databases in Production.”
This webinar will feature Matt Aslett, Research Director at 451 Research, and Bobby Patrick, EVP and CMO at Basho Technologies. This webinar will set the stage with NoSQL trends and adoption across various industries. It will then discuss some of the key benefits of distributed NoSQL systems and explore how systems like Riak are evolving.
Wes Jossey, Systems Engineer at Tapjoy, will also be joining the webinar to discuss how Tapjoy uses distributed databases to provide reliable data locality to their customers through multi-datacenter replication.
Register here for the free “Beyond NoSQL – Distributed Databases in Production” webinar.
October 28, 2013
The technology community is extremely agile and fast-paced. It can turn on a dime to solve business problems as they arise. However, with this agility comes budding terminology that can often provide false categorizations. This can lead to confusion, especially when companies evaluate new technologies based on a surface understanding of these terms. The world of data is full of these terms, including the notorious “NoSQL” and “big data.”
As described in a previous post, NoSQL is a misleading term. This term represents a response to changing business priorities that require more flexible, resilient architectures (as opposed to the traditional, rigid systems that often happen to use SQL). However, within the NoSQL space, there are dozens of players that can be as different from one another as they are from any of the various SQL-speaking systems.
Big data is another term that, while fairly self-explanatory, has been overused to the point of dilution. One reason why NoSQL databases have become necessary is because of their ability to easily scale to keep up with data growth. Simply storing a lot of data isn’t the solution though. Some data is more critical than others (and should be accessible no matter what) and some data needs to be analyzed to provide business insights. When digging into a business, big data is too vague a term to describe both of these use cases.
As these terms (to highlight a few) are used, it can lead to industry confusion. One area of confusion that we have experienced relates to Basho’s own distributed database, Riak, and the distributed processing system, Hadoop.
While these two systems are actually complementary, we are often asked “How is Riak different from Hadoop?”
To help explain this, it’s important to start with a basic understanding of both systems. Riak is a distributed database that is built for high availability, fault tolerance, and scalability. It is best used to store large amounts of critical data that applications and users need to constantly be able to access. Riak is built by Basho Technologies and can be used as an alternative to or in conjunction with relational databases (such as MySQL) or to other “NoSQL” databases (such as MongoDB or Cassandra).
Hadoop is a framework that allows for the distributed parallel processing of large data sets across clusters of computers. It was originally based on the “MapReduce” system, which was invented by Google. Hadoop consists of two core parts: the underlying Hadoop Distributed File System (HDFS), which ensures stored data is always available to be analyzed, and MapReduce, which allows for scalable computation by dividing and running queries over multiple machines. Hadoop provides an inexpensive, scalable solution for bulk data processing and is mostly used as part of an overarching analytics strategy, not for primary “hot” data storage.
One easy way to distinguish between the two is to look at some of the common use cases.
Riak Use Cases
Riak can be used by any application that needs to always have access to large amounts of critical data. Riak uses a key/value data model and is data-type agnostic, so operators can store any type of content in Riak. Due to the key/value model, certain industry use cases fit easily into Riak. These include:
- Gaming – storing player data, session data, etc
- Retail – underpinning shopping carts, product inventories, etc
- Mobile – social authentication, text and multimedia storage, global data locality, etc
- Advertising – serving ad content, session storage, mobile experiences, etc
- Healthcare – prescription or patient records, patient IDs, health data that must always be available across a network of providers, etc
For a full list of use cases, check out our Users Page.
Hadoop Use Cases
Hadoop is designed for situations where you need to store unmodeled data and run computationally intensive analytics over that data. The original use cases of both MapReduce and Hadoop were to produce indexes for distributed search engines at Google and Yahoo respectively. Any industry that needs to do large scale analytics to better improve their business can use Hadoop. Some common examples include finance (build models to do accurate portfolio evaluations and risk analysis) and eCommerce (analyze shopping behavior to deliver product recommendations or better search results).
Riak and Hadoop are based on many of the same tenets, making their usage complementary for some companies. Many companies that utilize Riak today have created scripts, or processes, to pull data from Riak and push into other solutions (like Hadoop) for the purpose of historical archiving or future analysis. Recognizing this trend, Basho is exploring the creation of additional tools to simplify this process.
If you are interested in our thinking on these data export capabilities, please contact us.
Every tool has its value. Hadoop excels at being used by a relatively small subset of the business to answer big questions. Riak excels at being used by a very large number of users and powering critical data for businesses.
August 20, 2013
NoSQL is a misleading name. SQL was never the problem. However, this poorly named industry term does represent a response to changing business priorities and new challenges that require different kinds of database architectures.
Traditional database architectures were first developed in the late 60s and early 70s. They were the default option for many pre-Internet use cases and remain useful today for certain use cases requiring a relational data model. However, their limits are painfully apparent to many companies. Despite what traditional database vendors might have us believe, very little data generated today actually requires a SQL architecture. Businesses face many new challenges today that traditional databases simply are not designed to handle reliably or efficiently. These include:
- Global Users. It is no longer enough to provide a fast experience in one country. Users from all over the globe expect a low-latency experience, making geo-data locality more important than ever.
- Zero Downtime. Planned and unplanned. Both are bad for business. There is now an expectation for always-on availability. Operations teams emphasize must resiliency over recovery.
- Scale Matters. Businesses need to scale up quickly to meet peak loads during the holidays or product launches, and then they need to scale back down. They need an architecture that makes scaling the least of their worries.
- Flexible Data. From user generated data to machine-to-machine (M2M) activity, unstructured data is now commonplace. Businesses need flexibility to handle all the data generated and flowing.
- Omnichannel. Whether users are on a tablet, laptop, or smartphone, they require a device agnostic experience and low-latency.
- Amazon Economics. Every business wants Amazon Economics. With the nature of data growth today, businesses can’t afford expensive machines at every juncture. They need commodity machines to scale horizontally, not vertically.
Attempts to address these challenges with traditional databases result in an inflexible architecture with super high costs. “NoSQL” databases represent a fresh approach towards building flexible, resilient architectures. “NoSQL” goes where no database has ever gone before — into the wild space of the Internet and the massive scale requirements it represents.
Which brings us to NoSQL Now! Basho is sponsoring because the movement is more important than any single industry term. Andy Gross will also be on-hand to further discuss the larger trend of distributed systems:
Dealing with Systems in a New Distributed World
Chief Architect and Co-creator of Riak
Thursday, August 22, 2013
Please join us in San Jose for a look at the future of database technology.
At NoSQL NOW!, industry leaders and developers come together to share ideas at the largest vendor agnostic event of disruptive NoSQL technologies for infrastructure architecture.
Basho has a strong and salient presence at this conference. Both their open source distributed database, Riak, and cloud storage software, Riak CS, are NoSQL technologies that offer high availability, fault-tolerance, and operational ease-of-use. Unlike traditional databases, Riak automatically distributes data in the cluster, eliminating the need for manual sharding, and its masterless design means that nodes can fail without bringing down the entire system. Due to this architecture, Riak has become foundational to many of the world’s fastest-growing Web-based, mobile and social applications like Comcast, Voxer, and Best Buy. The Basho team will be available at the conference to answer questions about Riak and how to move from a relational system to a distributed one.
In addition to exhibiting, Basho Chief Architect and Co-Creator, Andy Gross, will be speaking on August 22 at 3:00pm. His talk, “Dealing with Systems in a New Distributed World” will discuss the resurgence in interest of both theoretical and applied distributed systems and its consequences for software developers. He will explore new areas of promising research, and provide practical advice for dealing with systems in our new distributed world. Finally, he will discuss how technologies are shifting to meet emerging business requirements, while simultaneously minimizing immediate operational burdens and enabling ease of scale.
Andy is a distributed systems nerd, co-creator of Riak and Webmachine, and Chief Architect at Basho Technologies. Before Basho, Andy hacked on various distributed systems at Apple, Akamai, and Mochi Media.
For more information about Riak, common use cases, and an in-depth analysis of the benefits of migrating to a distributed NoSQL database, download “From Relational to Riak.”
July 3, 2013
Basho CTO, Justin Sheehy, recently participated in a “Not Only SQL Summit,” alongside executives from some of the top NoSQL vendors. This summit was moderated by Ted Neward of Neward & Associates LLC and discussed the evolution of NoSQL systems as well as some associated best practices. It also included insights from customers currently using these NoSQL solutions.
In addition to Justin Sheehy, panelists included:
- Anthony Molinaro, Infrastructure Architect at OpenX, discussing how they use Riak
- Patrick McFadin, Principal Solution Architect at DataStax
- Michael Kjellman, Software Engineer at Barracuda Networks, discussing how they use Cassandra
- Justin Weiler, CTO at FatCloud
- Attinder Khalsa, Executive Software Architect at Wilshire Axon, discussing how they use FatDB
Throughout this summit, OpenX, Barracuda Networks, and Wilshire Axon discussed not only why they chose to move away from relational systems but also why they chose the NoSQL vendor that they did. They also talk about their experiences dealing with eventual consistency and schemaless data. You can view the full summit below:
April 17, 2013
This post looks at five commonly asked questions about Riak. For more questions and answers, check out our Riak FAQ.
What hardware should I use with Riak?
Riak is designed to be run on commodity hardware and is run in production on a variety of different server types on both private and public infrastructure. However, there are several key considerations when choosing the right infrastructure for your Riak deployment.
RAM is one of the most important factors – RAM availability directly affects what Riak backend you should use (see question below), and is also required for complex MapReduce queries. In terms of disk space, Riak automatically replicates data according to a configurable n_val. A bucket-level property that defaults to 3, n_val determines how many copies of each object will be stored, and provides the inherent redundancy underlying Riak’s fault-tolerance and high availability. Your hardware choice should take into consideration how many objects you plan to store and the replication factor, however, Riak is designed for horizontal scale and lets you easily add capacity by joining additional nodes to your cluster. Additional factors that might affect choice of hardware include IO capacity, especially for heavy write loads, and intra-cluster bandwidth. For additional factors in capacity planning, check out our documentation on cluster capacity planning.
Riak is explicitly supported on several cloud infrastructure providers. Basho provides free Riak AMIs for use on AWS. We recommend using large, extra large, and cluster compute instance types on Amazon EC2 for optimal performance. Learn more in our documentation on performance tuning for AWS. Engine Yard provides hosted Riak solutions, and we also offer virtual machine images for the Microsoft VM Depot.
What backend is best for my application?
Riak offers several different storage backends to support use cases with different operational profiles. Bitcask and LevelDB are the most commonly used backends.
Bitcask was developed in-house at Basho to offer extremely fast read/write performance and high throughput. Bitcask is the default storage engine for Riak and ships with it. Bitcask uses an in-memory hash-table of all keys you write to Riak, which points directly to the on-disk location of the value. The direct lookup from memory means Bitcask never uses more than one disk seek to read data. Writes are also very fast with Bitcask’s write-once, append-only design. Bitcask also offers benefits like easier backups and fast crash recovery. The inherent limitation is that your system must have enough memory to contain your entire keyspace, with room for a few other operational components. However, unless you have an extremely large number of keys, Bitcask fits many datasets. Visit our documentation for more details on Bitcask, and use the Bitcask Capacity Calculator to assist you with sizing your cluster.
LevelDB is an open-source, on-disk key-value store from Google. Basho maintains a version of LevelDB tuned specifically for Riak. LevelDB doesn’t have Bitcask’s memory constraints around keyspace size, and thus is ideal for deployments with a very large number of keys. In addition to this advantage, LevelDB uses Google Snappy data compression, which provides particular efficiency for text data like raw text, Base64, JSON, HTML, etc. To use LevelDB with Riak, you must the change the storage backend variable in the app.config file. You can find more details on LevelDB here.
Riak also offers a Memory storage backend that does not persist data and is used simply for testing or small amounts of transient state. You can also run multiple backends within a single Riak instance, which is useful if you want to use different backends for different Riak buckets or use a different storage configuration for some buckets. For in-depth information on Riak’s storage backends, see our documentation on choosing a backend.
How do I model data using Riak’s key/value design?
Riak uses a key/value design to store data. Key/value pairs comprise objects, which are stored in buckets. Buckets are flat namespaces with some configurable properties, such as the replication factor. One frequent question we get is how to build applications using the key/value scheme. The unique needs of your application should be taken into account when structuring it, but here are some common approaches to typical use cases. Note that Riak is content-agnostic, so values can be any content type.
|Session||User/Session ID||Session Data|
|Content||Title, Integer||Document, Image, Post, Video, Text, JSON/HTML, etc.|
|Advertising||Campaign ID||Ad Content|
|Sensor||Date, Date/Time||Sensor Updates|
|User Data||Login, Email, UUID||User Attributes|
For more comprehensive information on building applications with Riak’s key/value design, view the use cases section of our documentation.
What other options, besides strict key/value access, are there for querying Riak?
Most operations done with Riak will be reading and writing key/value pairs to Riak. However, Riak exposes several other features for searching and accessing data: MapReduce, full-text search, and secondary indexing.
Riak also provides Riak Search, a full-text search engine that indexes documents on write and provides an easy, robust query language and SOLR-like API. Riak Search is ideal for indexing content like posts, user bios, articles, and other documents, as well as indexing JSON data. For more information, see the documentation on Riak Search.
Secondary indexing allows you to tag objects in Riak with one or more queryable values. These “tags” can then be queried by exact or range value for integers and strings. Secondary indexing is great for simple tagging and searching Riak objects for additional attributes. Check out more details here.
How does Riak differ from other databases?
We often get asked how Riak is different from other databases and other technologies. While an in-depth analysis is outside the scope of this post, the below should point you in the right direction.
Riak is often used by applications and companies with a primary background in relational databases, such as MySQL. Most people who move from a relational database to Riak cite a few reasons. For one, Riak’s masterless, fault-tolerant, read/write available design make it a better fit for data that must be highly available and resilient to failure scenarios. Second, Riak’s operational profile and use of consistent hashing means data is automatically redistributed as you add machines, avoiding hot spots in the database and manual resharding efforts. Riak is also chosen over relational databases for the multi-datacenter capabilities provided in Riak Enterprise. A more detailed look at the difference between Riak and traditional databases and how to make the switch can be found in this whitepaper, From Relational to Riak.
A more detailed look at the technical differences between Riak and other NoSQL databases can be found in the comparisons section of our documentation, which covers databases such as MongoDB, Couchbase, Neo4j, Cassandra, and others.
February 24, 2013
Recently, Basho engineer, Eric Redmond, published “A Little Riak Book.” This book is available free for download at littleriakbook.com and provides a great overview of Riak, including how to think about a distributed system compared to more traditional databases.
The book starts with a discussion on concepts. Since Riak is a distributed NoSQL database, it requires developers to approach problems differently than they would with a relational database. The concepts section describes the differences between various NoSQL systems, takes an in-depth look at Riak’s key/value data model, and describes how Riak is designed for high availability (as well as how it handles eventual consistency constraints). After laying the theoretical groundwork, the book walks developers through how to use Riak by explaining the different querying options and showing them how to tinker with settings to meet different use case needs. Finally, it covers the basic details that operators should know, such as how to set up a Riak cluster, configure values, use optional tools, and more.
After finishing the book, start playing around with Riak to see if it’s the right fit for your needs. You can download Riak on our Docs Page.
August 10, 2012
We have a poorly defined term in our industry: “NoSQL.” [Does your toaster run SQL? No? Then you own a NoSQL toaster.] Be that as it may, Riak falls under the umbrella of software that carries this label. In our attempt to own the label, we reinterpret it to mean that we now have more choices as developers. For too long, our only meaningful options for data storage were SQL relational databases and the file system.
In the past few years, that has changed. We now have many production-ready tools available for storing and retrieving data, and many of those fall within the sphere of NoSQL. With all of these new options, how do we as developers choose which database to use?
On the Professional Services Team, this is the first question we ask ourselves: What is the best storage option for this application? At Basho, Professional Services goes on-site to assist clients with training, application development, operational planning – anything to help get the most out of Riak. In order to know how to do that, we also have to know quite a bit about other NoSQL databases and storage options, and when it might be a better option to go with something other than Riak. Below we outline some of our reasoning when we evaluate Riak for our clients and their applications.
A Simple Key-Value Store
When our clients simply need a key-value store, our job as consultants couldn’t get any easier. Riak is a great key-value database with an excellent performance profile, fantastic high availability and scaling properties, and the best deployment/operations story that we know. We are very proud of our place in the industry when it comes to these features.
But when the business logic for the application requires an access pattern more sophisticated than a simple key lookup, we have to dig deeper to figure out whether Riak is the right tool for the job. We have evolved the following distinguishing criteria:
If there is a usage scenario requiring ad-hoc, dynamic querying, then we might consider alternative solutions.
- Ad-hoc: by this we mean that queries run at unpredictable times, possibly triggered by end-users of the application.
- Dynamic: by this we mean that queries are constructed at the time they are being run.
If the usage scenario requires neither ad-hoc nor dynamic queries, then we can usually construct the application in such a way that even complex analysis works well with Riak’s key-value nature. If the scenario requires ad-hoc but not dynamic queries, then we have look at options to tune performance of the known access patterns. If the scenario requires dynamic queries run on a regular basis, then we might investigate running the dynamic queries on an ‘offline’ cluster replica so that we don’t interfere with the availability of the ‘online’ production clusters.
These criteria began to take form in our evaluations of Riak for data analytics. We often see Riak deployed as a Big Data solution because of its exceptional fault-tolerance and scaling properties, and running analytics on Big Data is a common use case. MapReduce gives us the ability to run sophisticated analytics on Riak, but other solutions exist that are optimized for analytics in ways that Riak is not. It is generally not a good idea to run MapReduce on a production Riak cluster for data analysis purposes. MapReduce exists in Riak primarily for data maintenance, data migrations, or offline analysis of a cluster replicate. All three of these are good use cases for Riak’s MapReduce implementation.
Key-Value State of Mind
Does that mean that data analysis applications are off the table? Absolutely not! In our training sessions and workshops, we emphasize that key-value databases requite a different mindset than relational databases when you are planning your application.
In traditional SQL applications, we as engineers start defining the data model, normalizing the data, and structuring models in such a way that relations can be fetched efficiently with appropriate indexing. If we do a good job modeling the data, then we can proceed with reasonable certainty that the application built on top if it will unfold naturally. The developers of the application layer will take advantage of well-known patterns and practices to construct their queries and get what they want out of the data model. It’s no surprise that SQL is pretty good for this kind of thing.
In a key-value store, we approach the software architecture from the opposite side and proceed in the other direction. Instead of asking what the data model should look like and working up to the application view, we begin by asking what the resulting view will look like and then work ‘backwards’ to define the data model. We start with the question: What do you want the data to look like when you fetch it from the database?
If we can answer the above question, and if we can define the structure of the result that we want in advance, then we probably have a good case for pre-processing the results. We pre-process the data in the application layer before it enters Riak, and then we just save the answer that we want as the value of a new key-value pair. In these cases, we can often get better performance when fetching the result than a relational approach because we don’t have to perform
the computation of compiling and executing the SQL query.
A rolling average is a simple example: Imagine that we want to have the average of some value within data objects that get added to the system throughout the day. In a SQL database, we can just call
average() on that column, and it will compute the answer at query time. In a key-value store, we can add logic in the application layer to catch the object before it enters Riak, fetch the average value and number of included elements from Riak, compute the new rolling average, and save that answer back in Riak. The logic in the application layer is now slightly more complicated, but we weigh this trade-off against the simplicity of administering the key-value database instead of a relational one. Now, when you go to fetch the average, it doesn’t have to compute it for you. It just returns the answer.
With the right approach, we can build applications in such a way that they work well with a key-value database and preserve the highly available, horizontally scaling, fault-tolerant, easy-as-pie administration that we have worked so hard to provide in Riak. We look forward to continuing to help you get the most out of Riak, and choosing the best tool for the job.
See Sean’s excellent post on Schema Design in Riak
: In some situations, using MapReduce to facilitate a bulk fetch provides better performance than requesting each object individually because of the connection overhead. If you go that route, be sure to use the native Erlang MapReduce functions like ‘reduce_identity’ already available in Riak. As always, test your solution before putting it into production.
May 11, 2011
I’m on a plane to Goto Copenhagen from our electric post-Kill-Dash-Nine Board meeting in Washington, DC and, afterwards, an intense client meeting. I went to watch Pete Sheldon, our new Director of Sales, and Justin Sheehy at work. I finally had a chance to sit and study a proposal for the Basho product roadmap for the next year. This roadmap is both breathtakingly ambitious and oddly incremental, quotidian even.
In the next year we will solve problems endemic to distributed systems – groundbreaking work of the sort careers are surely made — and yet at the same time, these problems seem incremental and iterative; part of an ongoing process of small improvements. They seem both astounding and inevitable.
This led me to an interesting insight — doing this is not easy.
What we are doing is like digging a canal through bedrock. We are hacking away at hard problems — problems others encountered and, either died trying or, mopping their brows with their handkerchiefs, threw down their shovels and went shopping. A lot of cool companies are hacking away, too, so it is not like we are alone, but the honorable and the diligent are not what this post is about.
This post is about the ugly truth I have to call out.
To put it bluntly, if you are claiming the architectural challenges presented by applications with high write loads spread across multiple data centers are easy, you are lying. You do not, as Theo Schlossnagle remarked recently to us, “respect the problem.” You must respect the problem or you disrespect the necessary tradeoffs. And if you disrespect the tradeoffs, you disrespect your user. And if you disrespect your user, you are, inevitably, a liar. You say _this_ is easy. You promise free lunches. You guarantee things that turn out to be impossible. You lie.
What our technology generation is attempting is really hard. There is no easy button. You can’t play fast and loose with the laws of physics or hand-wave around critical durability issues. You can sell this stuff to your venture capitalist, but we’re not buying it.
Immutable laws are not marketing. And therefore, marketing can’t release you from the bonds of immutable laws. You can’t solve the intractable problems of distributed systems so eloquently summarized with three letters – C-A-P – by Google’s cloud architect (and Basho Board member) Dr. Eric Brewer (a man both lauded and whose full impact on our world has not yet been reckoned), with specious claims about downloads and full consistency.
- Memory is not storage.
- Trading the RDBMS world for uptime is hard. There are no half-steps. No transitional phases.
- The geometry of a spinning disk matters for your app. You can’t escape this.
- Your hardware RAID controller is not perfect. It screws things up and needs to be debugged.
- Replication between two data centers is hard, let alone replication between three or 15 data centers.
- Easily adding nodes to a cluster under load impacts performance for a period determined by the amount of data stored on the existing nodes and the load on the system…and the kind of servers you are using…and a dozen other things. It looks easy in the beginning.
These are all sensible limitations. Like the speed of light or the poor quality of network television, these are universal constants. The point is: tradeoffs can’t be solved by marketing.
To be sure, there are faster databases than Riak. But do they ship with default settings optimized for speed or optimized for safety? We *ache* to be faster. We push ourselves to be faster. We tune and optimize and push. But we will never cross the line to lose data. While it is always tempting to set our defaults to *fast* instead of *safe*, we won’t do it. We will sacrifice speed to protect your data. In fact, if you prefer speed to preserving data, *don’t use Riak*. We tell the truth even if it means losing users. We will not lie.
Which is why others who do it make me ball my fists, score my palms, and look for a heavy bag to punch. Lying about what you can do – and spreading lies about other approaches – is a blatant attempt to replace the sacrifice of hard-core engineering and ops with fear, uncertainty, and doubt – FUD.
People who claim they are “winning NoSQL” with FUD are damaging our collective chance to effect a long-overdue change to the way data is stored and distributed. This opportunity is nothing short of a quantum shift in the the quality of your life if you are in development, operations, or are a founder who lives and dies by the minute-to-minute performance of your brainchild/application.
The FUD-spreaders are destroying this opportunity with their lies. They are polluting the well by focusing on false marketing – on being the loud idiot drunk – instead of solving the problem. They can screw this up with their failure. It is time for us to demand they drop the FUD – drop the “F” bomb – and stop lying about what they can do. Just tell the truth, like Basho does — admit this is a hard problem and that hardcore engineering is the answer. In fact, they should do the honorable thing and quit the field if they are not ready to invest in the work needed to solve this problem.
If we, collectively, the developer and sysadmin community running the infrastructure of the world economy, allow people to replace engineering with marketing lies, to trade coffee mugs for countless hours of debugging, and in doing so, to destroy the reputation of a new class of data storage systems before they have had a chance to gain a foothold in the technology world, we all lose.
There are many reasons why the FUD spreaders persist.
There are the smart folks who throw our hands up and cynically say that liars are by their nature better marketers. But marketing need not be lies, cynically accepted.
Then there are some of us who are too busy keeping projects or businesses afloat to really dig into the facts. But we sense that we are being lied to, and so we detach, saying this is all FUD. This can’t help us. Tragically, we miss the opportunity to make a big change
Most of us simply want to trust other developers and will believe claims that seem too good to be true. If we do this, we are in a small but serious way saying that our hard-won operational wisdom is meaningless, that anyone who has deployed a production application or contributed to an open-source project has no standing to challenge the loud-mouth making claims that up-time is easy.
Up-time is not easy. Sleeping through the night without something failing is a blessing. Do not – *do not* – let VCs and marketers mess up our opportunity to take weekends off and sleep through the night when we are on call. The database technologies of 1980 (and their modern apologists in NoSQL) should not shape the lives of technologists in 2011.
In the briefest terms, Basho won’t betray this revolution because we keep learning big lessons from our small mistakes. We are our harshest critics.
We will deliver a series of releases that allow you to tune for the entire spectrum of CAP tradeoffs – strong consistency to strong partition tolerance – while making clear the tradeoffs and costs. At the same time Riak will provide plugins for Memcache, secondary indices, and also a significant departure from existing concepts of MapReduce that allows for flexible, simple, yet massively distributed computation, and much more user-friendly error reporting and logging. (Everyone reading this understands why that last item merits inclusion on any list of great aspirations – usability can damn or drive a project.)
We will deliver these major innovations, improvements, and enhancements, and they will be hard for us and our community to build. And it will take time for us to explain it to people. And you will find bugs. And it will work better a year after it ships than it does on day one.
But we will never lie to you.
We call on others to please drop the FUD, to acknowledge the truth about the challenges we all face running today’s infrastructure, and to join us in changing the world for the better.
December 2, 2010
In the last two weeks, Basho has been fortunate to sign up some pretty cool clients. Considering we are a young company, that a database is among the stickiest pieces of software and therefore decisions to deploy something new are undertaken with caution, and that we have spent approximately $7,000 on marketing (mostly on sponsorship of a single event), the fact we are getting ten leads a week and converting leads to customers seems pretty amazing.
While this obviously puts the lie to the idea that the market for NoSQL is too early to build a business on, one thing is certain: what people want from NoSQL varies from significantly from client to client.
Some want high availability (especially write-availability) and scalability. Some want distributed analytical capabilities and low latency on queries of big data sets. Some want both. All of the people we are talking to have specific applications in mind and all of them are interested in using NoSQL to do something they really could not do before.
This is the proverbial “greenfield” for NoSQL. Not verticals (and especially not social networking, which is over-represented in examples because two of the great early NoSQL data stores were developed by Facebook and LinkedIn), but pent up demand is where we see growth and opportunity.
Some investors and product types worry this means there is no specific niche NoSQL fills, meaning the market is small and making it hard for small companies to thrive. While I happen to agree with the premise (there is no specific niche), I view that as an indicator of the potentially massive size of the opportunity. We are seeing pent up demand from companies that want to build web applications that are more reliable, scale better, use distributed map/reduce and indexing features, and run in data centers across continents.
No niche there.