February 17, 2015
According to TechTarget, a common definition of “High Availability” is:
“In information technology, high availability refers to a system or component that is continuously operational for a desirably long length of time. Availability can be measured relative to “100% operational” or “never failing.”
The reality is that this phrase has become semantically overloaded by its inclusion in marketing copy across a disparate set of technologies. Much like “Big Data”, perspectives on availability vary based on industry and customer expectation.
For many of today’s applications and platforms, high availability has a direct impact on revenue. A few examples include: cloud services, online retail, shopping carts, gaming and betting, and advertising. Further, lack of availability can damage user trust and result in a poor user experience for many social media and chat applications, websites, and mobile applications. Riak provides the high availability needed for your critical applications.
Availability – By the Numbers
As we highlighted in an infographic entitled Down with Downtime, more than 95% of businesses with 1,000+ employees estimate that they lose more than $100,000 for every 1 hour of downtime. For more than 1 in 2 large businesses, the cost of downtime amounts to more than $300,000 per hour. At the lower end of this scale, this is $83 dollars per minute. At the upper end of the spectrum (in financial services) it can amount to $1,800 a second of downtime.
This fiscal impact has resulted in availability being measured as a percentage calculation of uptime in a given year. This percentage is often referred to as the “number of 9s” of availability. For example, “one nine” of availability equates to 90% uptime in a year. Similarly, “five nines” (the standard that was set by consulting firms on enterprise projects) equates to 99.999% availability in a year. While that percentage is often referenced, the practical reality is that it means there can be no more than 6.05 seconds of unplanned downtime per week.
Availability – A Feature or A Benefit?
Often, when describing Riak, I begin by explaining the benefits of Riak (availability, scalability, fault tolerance, operational simplicity) and then discuss, in detail, the properties that these benefits are derived from. Availability is not something that can be added to a system (be it a distributed database or otherwise), rather it is an outcome of the core architectural decisions that were made in the development of the product.
Consider, for example, the AXD 301 ATM switch. It, reportedly, delivers at or better than “nine nines” (99.9999999%) of availability to customers. This is a staggering number that requires NO MORE than 6.048 milliseconds of downtime per week. Interestingly, it shares a common architectural component with Riak also being developed in Erlang.
“How does Riak achieve high availability?” Or, perhaps better stated as, “What are the architectural decisions made in Riak that enable high availability?”
Availability – An Architectural Decision
Riak is a masterless system designed for high availability, even in the event of hardware failures or network partitions. Any server (termed a “node” in Riak) can serve any incoming request and all data is replicated across multiple nodes. If a node experiences an outage, other nodes will continue to service read and write requests. Further, if a node becomes unavailable to the rest of the cluster, a neighboring node will take over the responsibilities of the missing node. The neighboring node will pass new or updated data (termed “objects”) back to the original node once it rejoins the cluster. This process is called “hinted handoff” and it ensures that read and write availability is maintained automatically to minimize your operational burden when nodes fail or comes back on-line.
More information about the architectural decisions involved in Riak’s design are available in our documentation. In particular, the Concepts – Clusters section is deeply illustrative.
Availability – A Use Case
Consider, for example the implementation of Riak at Temetra. Temetra has thousands of users and millions of meters that create billions of data points. The massive influx of data that was being generated quickly became difficult to manage with the company’s legacy SQL database. When considering how this structured database could be overhauled, Temetra conducted evaluations with Cassandra and Hadoop but ultimately chose Riak due to its high availability, relatively self-maintaining and easy to deploy infrastructure. It is essential that the data collected from the meters is always available as it is relied on to determine correct billing for Temetra’s customers.
Availability – A Summary
The reality is that a database, even a distributed, masterless, multi-model platform like Riak, is only one component of the application stack. Understanding your availability requirements requires deep knowledge of the entirety of the deployment environment. “High Availability” cannot be retrofit into a system. Rather it requires conscious effort in the early stages to ensure that customer requirements are met and that downtime does not result in lost customers and lost revenue.
By: Jeremy Hill
Business Intelligence makes it possible for organizations to make sense of the vast amount of customer, manufacturing and competitive information they have available in order to make smarter and better informed decisions. In turn, this enables organizations to become more responsive to customer needs, increase efficiencies in manufacturing processes, and respond to significant events quickly.
Historically the data that drives business intelligence has been stored in structured formats in a data warehouse, such as customer information on how much is spent. However, this approach misses out on the value of semi-unstructured and unstructured data, like the details from a customer call or a customer tweet.
With such information missing, a complete view of the customer or business can be limited. The consequence is that an inability to gain knowledge and measure customer information means businesses can fall behind, especially in a competitive market.
Business Intelligence needs NoSQL
Having access to all types of relevant customer information – structured, semi-structured and unstructured – is an essential requirement for business intelligence (BI) to help enterprises get ahead of the competition. Unlike structured, relational data warehouses, NoSQL databases make this possible with improved availability, scalability and fast response times. NoSQL databases are ideal for BI and data warehousing not only because of the diverse types of information it can deal with, but also because they are able to deliver data at the very time it is needed.
Enabling real-time analytics
NoSQL keeps up with transaction speeds as-it-happens, enabling real-time analytics. E-commerce transactions, for example, benefit from a NoSQL database because it can make a decision about what to do next when a buyer doesn’t complete a purchase. Instead of waiting 24 hours or longer for the data to move through a traditional data warehouse system, with a NoSQL system a feed goes straight from a transaction through a connecter to a NoSQL database. A sales analytics process can make a decision with the intelligence at that very minute, to consult the customer and understand the behavior in real-time, helping secure the purchase and preventing the loss of a customer transaction.
A recently announced Basho partner, Caserta Concepts, a technology consulting firm specializing in big data analytics, data warehousing and business intelligence, works with CIOs to deliver analytics solutions that support business goals. It uses Riak and Riak CS to accommodate unique client requirements across a broad range of data types – structured, semi-structured and unstructured – and provides continuous availability to keep critical line-of-business applications going around the clock. Caserta’s practice illustrates the viability for NoSQL in the database revolution to take on the volume, variety and velocity of data dynamics of today’s web-scale applications.
Intelligence for IoT transactions
With the vast amounts of information from Internet of Things (IoT) technologies, more business intelligence needs and use cases are at the cusp. Consider oil and gas organizations providing annual service contracts for boilers – analytics tells the business that anything beyond the second call out (or truck roll) wipes out the profit on the contract. In the connected world, NoSQL enables the next level of intelligence, which allows organizations to collect information so that, in the event of failure, they are able to determine which parts are needed in advance, eliminating the need for multiple visits. Gathering intelligence from this data also allows organizations to perform preemptive maintenance during the annual inspections to lower the frequency of unplanned, costly site visits.
With NoSQL, BI and data warehousing can become quicker and much more efficient. It allows organizations to react to events more quickly, increase customer attention, streamline the supply chain, predict customer behavior at the point it matters and predict future service calls. At the rise of big, unstructured data, NoSQL presents enormous opportunity for the future of business intelligence.
Basho is pleased to announce the release of Riak CS 1.5, which provides additional performance enhancements and simplifies administration and development with additional admin tools, enhanced S3 compatibility and a technical preview of an architecture to support clusters with very large amounts of storage. Highlights include:
- riak-cs-admin: Consolidates admin operations into a command line tool.
- riak-cs-stanchion: Changes the Stanchion IP and port.
- riak-cs-debug: Packages log, configuration and operating system command files along with Riak command results.
- syslog: Support for standardized syslog output for log aggregation using third-party tools.
S3 API Features
- multi-object delete: Reduces request overhead by supporting multiple deletes in a single request (up to 1,000 keys per request).
- cache control headers: Method for providing caching instructions in a request header.
- PUT object – copy: Creates a copy of an object that already exists in Riak CS.
A full list of S3 API compatibility can be found on the Basho docs site here.
Increased Scalability (Enterprise Feature)
Partly due to limitations with distributed Erlang, prior to 1.5 scalability, Riak CS was limited to several petabytes. CS 1.5 introduces a technical preview of an architecture that allows multiple Riak clusters to reside under a single CS namespace, thereby significantly increasing the amount of storage possible in a cluster. A production-ready version is planned for later this year, with multi-data center support to follow.
Garbage Collection Improvements
In Riak CS, deleted and updated objects are not removed immediately. Instead, a reference is written to a special bucket and later removed by the garbage collection process at regular intervals. CS 1.5 includes several garbage collection enhancements that will benefit customers with a high rate of object deletion or updates.
- concurrent garbage collection worker processes: Speed up the rate of garbage collection with the addition of multiple workers.
- flexible enforcement of leeway interval: In previous versions, updated and deleted objects are reaped only after they reach a predefined time-based leeway interval, which was set when an object was marked for deletion. In CS 1.5 the leeway interval is managed by the garbage collection daemon and can be changed to remove objects sooner, for example, in emergency situations where maximum storage capacity is reached.
Other Notable Enhancements
- faster bucket listings: Optimizations in the OTP xmerl library enables faster bucket listings, in particular for large buckets.
- setting ACLs upon PUT object: Ability to set ACLs via header at PUT object creation is now fully functional.
Riak CS 1.5 is available at: http://docs.basho.com/riakcs/latest/riakcs-downloads/. A full list of changes is available in the release notes. Watch the blog for a detailed discussion of the multi-cluster work.
Distributed cloud storage software adds additional Amazon S3 compatibility, performance improvements, simplified admin and increased scalability
CAMBRIDGE, Mass. – August 5, 2014 – Basho, the creator and developer of Riak, the industry leading distributed NoSQL database, today introduced Riak CS 1.5 and Riak CS 1.5 Enterprise, Basho’s distributed object storage software. Riak CS (Cloud Storage) is open source software built on top of Riak, used to build public or private clouds, or, as reliable storage to power applications and services. Riak CS 1.5 delivers new features that improve operation, performance and scalability. Basho continues to offer enterprise-class features in Riak CS Enterprise, which includes multi-datacenter replication, world class 24 by 7 support and flexible pricing model.
Companies dealing with large amounts of unstructured data like videos, images and documents are adopting cloud object storage so that data is highly available through a seamlessly scalable architecture. Businesses in industries such as broadcasting and telecommunications are relying on stability, integration functionality and performance of Riak CS to efficiently store, organize and access data while making it simple to manage.
“We offer our customers affordable and scalable cloud storage solutions built on Basho’s Riak CS,” said Makoto Oya, vice director of IDC Frontier. “The enhanced Amazon S3 compatibility and ability to scale well into the multi-petabyte level in Riak CS 1.5 will help us better support the rapid growth we are seeing in our storage business.”
I-NET Corp, a data processing service headquartered in Japan, uses Riak CS for its cloud service called Dream Cloud® and is looking to achieve further cost efficiency thanks to the increased scalability capabilities in Riak CS 1.5.
“Cloud-based object storage is ideal for storing our customer’s growing business-critical data, and we have relied on the excellent performance, cost efficiency and high reliability of Riak CS for the I-NET Dream Cloud®,” said Tsutomu Taguchi, senior managing director, business group of I-NET Corp. “Riak CS already provides us with high availability and now that Riak CS is further optimized to scale, we believe that Riak CS 1.5 delivered by Basho will drive even higher adoption of Dream Cloud®.”
New features enhance performance for object storage to store increasing amounts of data worldwide
Basho delivers new functions in Riak CS that include:
- Additional Amazon S3 compatibility: Expanded storage API compatibility with S3 includes features such as multi-object delete, put object copy, and cache control headers for more flexible integration with content delivery networks (CDNs).
- Performance improvement in garbage collection process: Delivered especially for customers with high rate of object updates and deletes, Riak CS now more quickly reaps objects flagged for garbage collection.
- New, simplified administrative features: New and consolidated admin features make organizational tasks easier for activities such as cluster management, monitoring and troubleshooting.
- Multi-cluster support: Technology preview for increased scalability of Riak CS Enterprise by allowing multiple Riak clusters to reside under a single CS namespace, thereby expanding the maximum capacity of a single cluster.
“Providing the strongest key value solution and object store means responding to customer needs and demands attentively,” said Dave McCrory, CTO of Basho. “With Riak CS 1.5 Enterprise, new features are delivered as requested by our customers. We are committed to make it easier to consume cutting edge versions of Riak and will continue to do this by executing a more iterative approach in how we release Riak.”
Availability and Pricing
Riak CS 1.5 is available immediately for Debian, Ubuntu, FreeBSD, OS X, Red Hat Enterprise Linux, Fedora, SmartOS and Solaris. To view the latest technical documentation or to download Riak CS, visit docs.basho.com/riakcs/latest/.
Basho delivers customized packages for its commercial software, Riak Enterprise and Riak Enterprise Plus, with health checks, as well as options for project-based Professional Services engagements. Full pricing details of Basho commercial software are at http://basho.com/riak-enterprise/#pricing. To request a trial license of Riak CS Enterprise, prospective inquiries can request a Riak CS Tech Talk at http://info.basho.com/SignUpRiakTechTalk.html.
- Basho Website (http://basho.com)
- Basho Blog (http://basho.com/blog/)
- Riak (http://basho.com/riak/)
- Riak CS (http://basho.com/riak-cloud-storage/)
- Riak CS doc (docs.basho.com/riakcs/latest/)
- Additional Resources (http://basho.com/resources/)
- Twitter: @Basho (https://twitter.com/basho)
- LinkedIn (https://www.linkedin.com/company/basho-technologies-inc)
About Basho Technologies
Basho is a distributed systems company dedicated to making software that is highly available, fault-tolerant and easy-to-operate at scale. Basho’s distributed database, Riak, and Basho’s cloud storage software, Riak CS, are used by fast growing Web businesses and by one third of the Fortune 50 to power their critical Web, mobile and social applications and their public and private cloud platforms.
Riak and Riak CS are available open source. Riak Enterprise and Riak CS Enterprise offer enhanced multi-datacenter replication and 24×7 Basho support. For more information, visit basho.com. Basho is headquartered in Cambridge, Massachusetts.
March 31, 2014
AiMED Stat is a startup working to facilitate better medical information capture, analysis, and reporting through web and mobile technologies. They provide clinicians with easy-to-use tools and provide researchers with direct access to real-time information capture from the front lines of medicine. They recently worked with the audiology clinic at University of Western Ontario (UWO) and used a Riak system to help the University collect and search data related to the research.
In general, innovation in health research databases has been very stagnant – with many companies simply opting for a legacy relational system like MySQL or PostgreSQL. However, AiMED Stat realized the limitations of these systems. With these relational systems, researchers would need to decide their schemas at the start of studies. However, once researchers were a few months into a study, they would need to update data or collect data in a different way. This meant researchers needed to update the entire table, which involved very costly data migration. As AiMED Stat set out to manage and present research data in a better way, it simply wasn’t feasible for their two-person team to manage a costly data migration every time there was a data update. So they began to look at more flexible, NoSQL databases as a replacement.
They first looked at MongoDB, but soon learned that MongoDB wouldn’t be able to handle their high write volumes without losing data. In clinical research, data loss is never acceptable as it can skew results. They then looked at Cassandra; however, for a small team, they found Cassandra to be too complex to operate efficiently. Finally, they evaluated Riak. They were immediately drawn to Riak’s flexible data model, schemaless design, and ability to scale out quickly. In 2011, they brought Riak into production as the backend of their research data application.
“We set out to create an application that stores and queries data in a way researchers understand,” said Kartik Thakore, Co-Founder at AiMED Stat. “By using Riak to power our application, it gives us a sizable competitive advantage (relative to other electronic audiograms). Its flexibility allows us to store data exactly as needed, its ease-of-scale eliminates the chunk of our budget previously dedicated to data migration, and its high availability ensures we never have to worry about losing data. Riak is a breath of fresh air – it does exactly what we need it to do.”
Their Riak application enables rich HTML5 forms for data collection, using a method that increases compliance and data integrity at the point of capture. From data collection, demographic identifiers are used as the key in Riak and values are stored as JSON. Riak post- and pre-commit hooks are used to further validate the data. Additionally, Riak Search, Secondary Indexes, and MapReduce are all used to allow researchers to store and search data (via a D3.js enabled application) using an Audiogram shown below:
(Audiogram shows Frequency vs. Decibel and uses the ANSI Symbol Legend)
This Audiogram allows researchers to easily search within the graph to find and compare patients that match certain audiological profiles. The quicker researchers can find patients for their study, the quicker they can get funding, making this queryability imperative.
AiMED Stat is currently running five-nodes in production and looking to scale out as they grow. “For us, the importance is not on big data but on never losing data,” continued Kartik. “With Riak, we can rest assured that all our data is archived and accessible, regardless of scale or write volume.”
November 14, 2013
This series of blog posts will discuss how Riak differs from traditional relational databases. For more information about any of the points discussed, download our technical overview, “From Relational to Riak.” The previous post in the series was Relational to Riak – High Availability.
Riak is designed for scalability, which truly separates it from relational systems. As described in the previous post, relational databases run best on a single server. If the dataset grows beyond the capacity of this single machine, it can become prohibitively expensive (or even impossible) to simply upgrade to a bigger machine. In such a scenario, the only option may be to add more machines and divide the dataset across them using a technique called sharding.
Sharding divides data into logical parts (such as alphabetical, by customer, or by geographic region) that can be distributed across multiple machines – often manually. If data continues to grow, this process may need to be repeated at great expense.
Sharding is not only difficult, it also will typically lead to hot spots – meaning certain machines are responsible for storing and serving a disproportionately high amount of both data and requests. Hot spots can cause unpredictable latency and degraded performance.
(And remember all the ways in which availability is a challenge? Combine sharding with a master/slave architecture for maximal expense and general unpleasantness.)
Instead of sharding, Riak evenly distributes data across a cluster using consistent hashing. In a Riak cluster, the data space is divided into partitions which are claimed by the servers. When new data is written to the database, these objects are evenly placed around the ring and replicated 3 times (by default). This ensures that your data will always be available, even when nodes fail.
When nodes are added or removed, data is rebalanced automatically. New machines assume ownership of some of the partitions and existing machines hand off relevant partitions and associated data until data ownership is equal amongst nodes.
By eliminating the manual requirements of sharding and making hot spots highly unlikely, Riak makes it significantly easier for companies to scale, whether it’s just for a few months to handle peak loads or to support long-term growth strategies.
January 29, 2013
This is the first in a series of blog posts covering the benefits Riak offers to developers and operators of retail and eCommerce platforms. To learn more, join our “Retail on Riak” webcast on Friday, February 8th.
As retailers grow and have to store more and more data, traditional relational databases aren’t always the best option. Retailers want to scale easily, without the operational burden of manual sharding. Meanwhile, business requirements demand their data is always available for reads and writes. Riak is a highly available, low latency distributed database that is ideal for retailers who need to serve product data quickly and maintain “always on” shopping experiences. Riak is based on architectural principles from Amazon. Riak is designed for high availability and scale so retailers can always serve customers, even under failure conditions, and rapidly grow to meet peak loads.
Retailers of all sizes have chosen Riak to power parts of their business, including:
- Best Buy: Best Buy is North America’s top specialty retailer of consumer electronics, personal computers, entertainment software, and appliances. Riak has been an integral part in the transformation push to re-platform Best Buy’s eCommerce platform. For more info, check out Best Buy’s talk from our 2012 developer conference, RICON.
- ideal: ideel is one of the fastest growing retailers with over 5 million members and more than 1,000 brand partners. They use Riak to serve HTML documents and user-specific products. ideel chose Riak to power their event-based shopping experience due to Riak’s ability to serve users information at low latency and provide ease of use and scale to ideel’s operations team. Check out the complete case study for more details.
- Copious: Copious is a social commerce marketplace that makes it easy for people to buy and sell the things they love. They currently store all registered accounts in Riak as well as the tokens that make it possible for users to authenticate with Copious via their Facebook or Twitter accounts. They chose to use Riak for their social login functionality because of its operational simplicity, which allows them to easily scale up without sharding and provides the high availability required for a smooth user experience. For more details, check out the complete Copious story on our blog.
For more information about the benefits of Riak for retailers and the retailers already using it, register for our “Retail on Riak” webcast on February 8th!
January 22, 2013
Traditionally, most retailers have used relational databases to manage their platforms and eCommerce sites. However, with the rapid growth of data and business requirements for high availability and scale, more retailers are looking at non-relational solutions like Riak.
Riak is a masterless, distributed database that provides retailers with high read and write availability, fault-tolerance and the ability to grow with low operational cost. Architectural, operational and development benefits for retailers include:
- “Always On” Shopping Experience: Based on architectural principles from Amazon, Riak is designed to favor data availability, even in the event of hardware failure or network partition. For retailers, failure to accept additions to a shopping cart, or serve product information quickly, has a direct and negative impact on revenue. Riak is architected to ensure the system can always accept writes and serve reads at low-latency.
- Resilient Infrastructure: At scale, hardware malfunction, network partition, and other failure modes are inevitable. Riak provides a number of mechanisms to ensure that retail infrastructure is resilient to failure. Data is replicated automatically within the cluster so nodes can go down but the system still responds to requests. This ensures read and write availability, even in serious failure conditions.
- Low-Latency Data Storage: Many retailers now operate online and mobile experiences with an API or data services platform. In order to provide a fast and available experience to end users, Riak is designed to serve predictable, low-latency requests as part of a service-oriented infrastructure and is accessible via HTTP API, protocol buffers, or Riak’s many client libraries.
- Scale to Peak Loads with Low Operational Cost: During major holidays and other periods of peak load, retailers may have to significantly increase their database capacity quickly. When new nodes are added, Riak automatically distributes data evenly to naturally prevent hot spots in the database, and yields a near-linear increase in performance and throughput when capacity is added.
- Global Data Locality and Redundancy: Riak Enterprise’s multi-site replication allows replication of data to multiple data centers, providing both a global data footprint and the ability to survive datacenter failure.
Top retailers using Riak include Best Buy and ideel. Best Buy selected Riak as an integral part in the transformation push to re-platform its eCommerce platform. For more information about how Best Buy is using Riak, check out this video.
ideel uses Riak to serve HTML documents and user-specific products. ideel chose Riak to provide its highly available, event-based shopping experience – Riak gives them the ability to serve user information at low latency and provides ease of use and scale to ideel’s operations team. For more information on ideel’s use of Riak check out the complete case study.
Common use cases for Riak in the retail/eCommerce space include shopping carts (due to Riak’s “always-on” capabilities), product catalogs (Riak is well suited for the storage of rapidly growing content that needs to be served at low-latency), API platforms (Riak’s flexible, schemaless design allows for rapid application development), and mobile applications (Riak is ideal for powering mobile experiences across platforms due to its low-latency, always-available small object storage capabilities).
To help retailers evaluate and adopt Riak, we’ve published a technical overview: “Retail on Riak: A Technical Introduction.” We discuss more in-depth information on modeling applications for common use cases, switching from a relational architecture, querying, multi-site replication and more.
**January 02, 2013**
New to Riak? Thinking about using Riak instead of a relational database? Join Basho chief architect Andy Gross and director of product management Shanley Kane for an intro this Thursday (11am PT/2pm ET). In about 30 minutes, we’ll cover the basics of:
* Scalability benefits of Riak, including an examination of limitations around master/slave architectures and sharding, and what Riak does differently
* A look at the operational aspects of Riak and where they differ from relational approaches
* Riak’s data model and benefits for developers, as well as the tradeoffs and limitations of a key/value approach
* Migration considerations, including where to start when migrating existing apps
* Riak’s eventually consistent design
* Multi-site replication options in Riak
Register for the webcast [here](http://info.basho.com/RelationalToRiakJan3.html).
November 11, 2010
Things are moving incredibly fast in the NoSQL space. I am used to internet-fast — helping bring on 300 customers in a year at Akamai; going from adult bulletin boards and leased lines to hosting sites for twenty percent of the Fortune 500 at Digex (Verizon Business) in eighteen months. I have never seen a space explode like the NoSQL space.
Two weeks ago, Justin Sheehy stood on stage delivering a rousing and thoughtful presentation to the NoSQL East Conference that was less about Riak and more about a definition of first principles that underpinned Riak: what it REALLY means when you claim such terms as scalability (it doesn’t mean buying a bigger machine for your master DB) and fault-tolerance (it has to apply to writes and reads and is binary; you either always accept writes and serve reads or you don’t). The conference was a bit of a coming out party for Basho, which co-sponsored the event with Rackspace, Georgia Tech, and a host of other companies. We had been working on Riak for 18 months or so in relative quiet and it was nice to finally see what people thought, first hand.
There were equally interesting presentations about Pig and MongoDB and a host of other NoSQL entrants, all of which will make for engrossing viewing when they finally get posted. We were told this wasn’t quite as exciting as the NoSQL conference out West but none of us seemed to mind. Home Depot, Turner Broadcasting, Weather.com, and Comcast had all sent folks down to evaluate the technology for real, live problems and the enthusiasm in the auditorium spilled out into the Atlanta bars. Business cards were exchanged, calls set up, even a little business discussed. Clearly, NoSQL databases were maturing fast.
No sooner had we returned to Cambridge than news of Flybridge’s investment in 10Gen came out. Hooray! Someone was willing to bet a $3.4 million dollars on a company in the space. Chip Hazard, ever affable, wrote a nice blog post explaining the investment. According to him, every developer they talked to had downloaded some NoSQL database to test. Brilliant news. He said Flybridge invested in 10Gen because they liked the space and knew the team from their investment in Doubleclick, from whose loins the management team at 10Gen issued. No more felicitous reason exists for a group of persons to invest $3.4 million than that previous investments in the same team were handsomely rewarded. I would wish Chip and 10Gen the best if I had time.
Because contemporaneous with the news of Flybridge’s investment, and almost as if the world had decided NoSQL’s time had come, we began to field emails and calls from interested parties. Trials, quotes, lengthy discussions about features and uses of Riak — the week was a blur. Everyone was conducting a bakeoff: “I have a 4TB database and customers in three continents. I am evaluating Riak and two other document datastores. Tell me about your OLAP features.”
Heady times and, frankly, of somewhat dubious promise, if you ask me. Potential clients that materialize so quickly always seem to disappear just as fast. Really embracing a new technology requires trials, tests, new features, and time. Time most off all. These “bluebirds” would fly away in no time, if my experience held true.
Except, this time it didn’t happen. Contracts were exchanged. Pen nibs were sharpened. It is as if the entire world decided to not wait for the everyone else to jump on the bandwagon and instead, decided to go NoSQL. Even using this last week as the sole example, I think the reason is plain — people have real pain and suddenly the word is out that they no longer have to suffer.
Devs are constrained by what they can build, rich features notwithstanding. Ask the company that had to choose between Riak and a $100K in-memory appliance to scale. And Ops is getting slaughtered — the cost of scaling poorly (and by poorly I mean pagers going off during dinner, bulk updates taking hours and failing all the time, fragmented and unmanageable indices consuming dozens of machines) is beginning to look like the cost of antiquated technology. Good Ops people are not fools. They look for ways to make life easier. Make no mistake — all the Devs and Ops folks came with a set of tough questions and a list of new features. They also came with an understanding that companies that release open source software still have a business to run. They are willing to spend on a real company. In fact, having a business behind Riak ended up mattering as much as any features.
So, I suspect, we are at the proverbial “end of the beginning.” Smart people in the NoSQL movement have succeeded in building convincingly good software and then explaining the virtues convincingly (all but one of the presentations at NoSQL East demonstrated the virtues of the respective approaches). Now these people are connecting to smart people responsible for building and running web apps, people who are decidedly unwilling to sit around hoping for Oracle or IBM to solve their problems.
In the new phase — which we will cleverly call the “beginning of the middle” — great tech will matter even more than it does now. It won’t be about selling or marketing or any of that. If our numbers are any indication of a larger trend, more people will download and install NoSQL databases in the next month than the combined total of the three months previous. More people in a buying frame of mind will evaluate NoSQL technology not in terms of its coolness but in terms of its ability to solve their real, often expensive problems. The next phase will be rigorous in a way this phase was not. People have created several entirely new ways to store and distribute data. That was the easy part.
Just as much as great tech, the people behind it will matter. That means more calls between us and Dev teams. That means more feature requests considered and, possibly, judiciously, agreed to.
That also means lots of questions answered. People care about support. They care about whether you answer their emails in a timely fashion and are polite. People want to do business with NoSQL. They want to spend money to solve problems. They need to know they are spending it with responsible, responsive, dedicated people.
Earl tweets about it all the time and I happen to agree: any NoSQL success helps all NoSQL players. I also happen to feel that any failure hurts all NoSQL players. As NoSQL rapidly ages into its adolescence, it will either be awkward and painful or exciting and characterized by incredible growth.
When I was a kid on the Navy base in Alameda, my babysitter watched soaps all afternoon, leaving me mostly to my own devices. If I stopped in, I always got roped in to hearing her explain her favorite stories. Most of all she loved how ridiculous they were, though she would never admit this exactly. Instead, adopting an attitude of gleeful incredulity, she would point out this or that attractive young actor and tell me how just a year ago, she was a little baby. “Soap people have to grow up quick, I guess,” was her single (and to her, completely satisfactory) explanation. “If they don’t, they get written out of the story.”