New, enhanced database and growing number of customers highlight strong year for the company
LONDON, UK. – November 20, 2014 – Basho, the creator and developer of Riak, the industry leading distributed NoSQL database, has seen a surge in deployment and a growing customer-base in EMEA as a result of the launch of Riak 2.0, the significantly enhanced version of its flagship platform.
2014 has seen significant successes for Basho, from the release of Riak 2.0 to news that Basho technology is powering Spine 2, the electronic backbone of the NHS. Basho has also seen strong growth in its EMEA customer-base, with the company working with businesses such as bet365, one of the world’s leading online gambling groups, StatPro, the cloud-based portfolio analysis service, and EE, the largest mobile operator in the United Kingdom to address their critical unstructured data needs. Basho has increased its number of customers in EMEA by 38 percent year on year, and these customer wins have contributed to revenue growth from Q2 to Q3 in 2014, which was up 90 percent.
“Our decision to implement Riak was purely strategic. After a stringent evaluation process we decided that Basho’s flexible, scalable database was best-suited to our needs,” said Martin Davies, Chief Executive Officer, Technology at bet365. “Given the huge amount of data we process on a daily basis – from customer details to betting odds – it was imperative that we had a platform to support this. We selected Riak, and have not been disappointed with the results.”
The gaming industry is becoming increasingly complex, with customers no longer satisfied with betting on a limited selection of outcomes. Now, gaming companies must offer more than your traditional betting options. For example, during football matches, it is no longer enough to offer odds on scorer or full-time result. Instead consumers are eager to bet on everything from the number of yellow cards, to corners and amount of injury time. To offer and process these options requires a huge amount of data-crunching, and in addition to the vast number of metrics and numbers processed when taking into account everything from betting odds, bets placed and the final action on each account, such businesses require a lightning-fast database to support the deluge and prevent system crashes.
Basho’s growing stature in the gaming sector has been matched by its recent success in the telecommunications space. An increasing number of telco companies like EE are using Riak to replace existing systems and provide fault-tolerance and scalability for the future. Riak’s strength in the industry is further highlighted by the market trend towards reducing the burden of managing complex hardware environments by providing a consolidated virtualized orchestration platform to replace much of the traditional hardware.
These recent deals highlight a strong year for Basho, while the reseller partnership with Nordicmind and its upcoming Riak Nordic Roadshow demonstrate its growing success in EMEA. Success in the region is further reflected in the appointment of Emmanuel Marchal as Managing Director EMEA, who will be leading enterprise focus in EMEA, as well as the continued work with companies such as Deutsche Vermögensberatung (DVAG), Germany’s largest stand-alone financial services distributor. The financial advisors of DVAG support over 6 million customers in all questions concerning financial planning, insurance and finances.
“We knew that with the release of Riak 2.0, 2014 would be a massive year for the company,” said Adam Wray, President and Chief Executive Officer at Basho. “However, the growth in deployment and the continued success of Riak was more significant than we expected – with customers responding in kind. This year alone we have made strides in several sectors, including telco, financial, gaming and healthcare, where we have helped complete a project with the NHS that could potentially save lives. Couple this with our growing number of partners, and we can happily say that Basho is going from strength-to-strength.”
November 10, 2014
Many data needs are better served by data stores that are optimized for maximum availability and scalability — rather than optimized for consistency. For certain use cases, there are elements to the data that require strong consistency. With Riak 2.0, in addition to eventual consistency, there is now a way to enforce strong consistency when needed.
NOTE: Riak’s strong consistency feature is currently an open-source-only feature and is not yet commercially supported.
Behavioral Changes with Strong Consistency
Strongly consistent operations in Riak function much like eventually consistent operations at the application level. The core difference lies in the types of errors Riak will report to the client.
Each request to update an object (except for the initial creation) must include a context value reflecting the last time the application read it. This is the same behavior that Riak clients have always followed with version vectors and strong consistency also mandates its use. Similarly, reading data from a strongly consistent Riak bucket functions exactly like eventually consistent reads.
If that value is not provided for an update operation to an existing object, Riak will reject it. This is because the database assumes that you have not seen the current value and may not know what you’re doing.
Similarly, if that context value is out of date, Riak will also reject update operations. The client must re-read the latest value and supply an update based on that new value, with the new context.
If Riak cannot contact a majority of the servers responsible for the key, the request will fail. Ordinarily, Riak is happy to accept all operations in the interest of high availability and never dropping a write – even in the extreme case of only one server surviving data center outages.
Strong consistency also eliminates object siblings, as it is effectively impossible for the cluster to disagree on the value of an object.
When considering consistency models in an application, it is easy for the logic to quickly become daunting. This is especially true when designing a workflow that leverages both eventually and strongly consistent models. It is, therefore, easiest to begin with a simple use case.
Consider the workflow involved in storing and updating username and password data. In the case of a password update, it is necessary that — at any given time — there be exactly ONE result for a user’s password. Relatedly, it is important to ensure that an update of this value is fully atomic or user experience is substantially degraded. It would be possible to leverage Riak for all the eventually consistent elements of the application and leverage strong consistency for the username and password.
To see how eventual and strong consistency can be combined to solve business problems, let’s take a not-so-hypothetical example from the energy industry.
Imagine you’re collecting massive amounts of geological data for analysis. Each batch of data must be processed by a single instance of your application. Since this processing can take hours, days, or even weeks to complete, it’s expensive if two applications handle the same batch.
Let’s walk through the sequence of events.
- Batch of data arrives for processing.
- The batch is stored in a large object store (like, Riak CS) under a batch ID.
- The batch ID is added to a pending job list in Riak and stored as a set (one of the new Riak Data Types).
This is a classic example of eventual consistency and an illustration of the value of the new Riak Data Types introduced with Riak 2.0. Storing a new batch ID in your database should never fail, even if servers are offline. If multiple applications are adding batch IDs to the pending list at the same time, it’s perfectly reasonable for those lists to temporarily diverge, as long as they can be trivially merged later.
Let’s continue to see where strong consistency comes into play.
- A compute node becomes available to process the data.
- The compute node retrieves the pending job list and picks a batch ID.
- The compute node attempts to create a lock for that batch ID.
This is where strong consistency is required. This lock object should be created in a bucket that is managed by the new strong consistency subsystem in Riak 2.0. If someone else also grabs that batch ID and tries to create another lock object, Riak’s strong consistency logic will reject this second attempt. This compute node will just start over by grabbing a new ID.
To detect crashed jobs, the lock object should be created with basic job data, such as which compute node owns the processing job and what time it was started.
- The compute node asks Riak to add the batch ID to a different set, a running job list.
- The compute node asks Riak to remove the batch ID from the pending list.
- The job runs.
- When completed, the compute node asks Riak to add the batch ID to a completed job list.
- Riak is asked to remove the batch ID from the running list.
- The compute node deletes the lock object (or updates it to reflect the completion of the processing job).
Tradeoffs When Using Strong Consistency
- Blind updates will be rejected, so the client must read the existing value before supplying a new one (except in the case of entirely new keys).
- Write requests may be slightly slower due to coordination overhead.
- If a majority of the servers responsible for a piece of data are unavailable, write requests will fail. Read operations may fail depending on the freshness of the data that is still accessible.
- Secondary indexes (2i) are not yet supported.
- Multi-datacenter replication in Riak Enterprise is not yet supported.
- Using Strong Consistency (for developers)
- Managing Strong Consistency (for operators)
- Strong Consistency (theory & concepts)
Strong Consistency is now available with Riak 2.0. Download Riak 2.0 on our Docs Page.
In a previous post we briefly introduced Riak 2.0 data types. The addition of these distributed Data Types simplifies application development by automatically handling sibling resolution. This means developers can spend less time thinking about the complexities of vector clocks and sibling resolution and, instead, let Data Types support their applications’ data access patterns.
Understanding these data types requires a brief trip through history…
Riak 1.4 Counters
Riak 1.4 introduced counters as the first data types. Prior to 1.4 we’ve always said: “Your data is opaque to Riak,” — and it still can be — but with the addition of counters that is not longer the case. Riak knows what is stored in a counter key, and how to increment and decrement it through the counter API. It isn’t necessary to fetch, mutate, or put a counter. Instead you just incremented by 5 or decremented by 100. Vector Clocks, as discussed in the post entitled Clocks Are Bad, or, Welcome to the Wonderful World of Distributed Systems, as Riak knew how to merge concurrent writes there was never a sibling created.
Counters are very valuable, but you can not build many applications on just counters. Now, in Riak 2.0, we’ve added more data types. We believe that, with the addition of these data types you can model many applications’ data storage needs with greater simplicity, and never have to write sibling merge functions again.
What are CRDTs?
You may have heard a Basho presentation, or blog post, reference “CRDTs”. CRDT stands for (variously) Conflict-free Replicated Data Type, Convergent Replicated Data Type, Commutative Replicated Data Type, and others. The key, repeated, phrase is “Replicated Data Types”.
Replication is inherent in Riak. It is what the n-value defines. It is part of what lends to the availability and fault tolerance characteristics that Riak provides. Data Types are a common construct in computing. Sets, Bags, Lists, Registers, Maps, Counters…etc.
That leaves us to consider the “C”.
Conflict Free, or “Opaque No More”
Riak is an eventually consistent system. It leans, very much, towards the AP end of the CAP spectrum. (For more reading on the topic, the Practical Tradeoffs section of A Little Riak Book is particularly illuminating). This availability is achieved with mechanisms like sloppy quorum writes to fallback nodes. However, even without partitions and many nodes, interleaved or concurrent writes can lead to conflicts. Traditionally, Riak keeps all values and presents them to the user to resolve. The client application must have a deterministic way to resolve conflicts. It might be to pick the highest timestamp, or union all the values in a list, or something more complex. Whatever approach is chosen, it is ad-hoc, and created specifically for the data model and application at hand.
With Riak data types, there is still “conflict”. However, the resolution for that conflict is inherent and part of the data type’s design. The data types for Riak 2.0 converge automatically, at write and read time, on the server. If a client application can model its data using the data types provided, no sibling values will be seen and there is no longer a need to write ad-hoc, custom merge functions.
When modeling an applications data domain in a programming language, developers are familiar with composing state from a few primitive data types. Riak Data Types give the developer that power back and expressivity, and relieve them of the burden of design and testing deterministic merge functions. The key is that the data is no longer opaque to Riak. When the Data Types API is leveraged, Riak “knows” what type of thing is being stored and is able to perform the merge automatically.
When reading a Data Type from Riak, you will only ever see a single value. That value is still eventually consistent, but it will be as correct as it can be given the amount of entropy in the database. When the system is stable, all values will converge on a single, deterministic, correct value.
What Data Types Are Available?
Riak 2.0 includes the following Data Types:
- Counters: as in Riak 1.4
- Flags: enabled/disabled
- Sets: collections of binary values
- Registers: named Binary values with values also binary
- Maps: a collection of fields that supports the nesting of multiple Data Types
The conflict resolution, as discussed above, is intrinsic to the Data Type itself. This table provides greater detail.
|Data Type||Use Cases||Conflict Resolution Rule|
||Each actor keeps and independent count for increments and decrements. Upon merge, the pairwise maximum of any two actors will win (e.g. if one actor holds 172 and other holds 173, 173 will win upon merge)|
||Enable wins over disable|
||If an element is concurrent added and removed the add will win|
||The most chronologically recent value wins, based on timestamps|
||If a field is concurrently added, or updated and removed, the addd / update will win|
A new version of Riak, with new Data Types, allowing you to model your application in more expansive ways. Take these Data Types for a spin and be sure to let us know how you use them in your applications.
September features developer conferences, Chicago Erlang, and even an “unconference.” Take a look at where Basho will be around the U.S. this month.
Strangeloop (September 17-19 in St. Louis, MO): Strangeloop is a great opportunity to learn about emerging languages, concurrent and distributed systems, and new database technologies. Basho is attending, so tweet us @basho if you’re interested in meeting.
Analytics and Big Data Summit (September 18 in San Jose, CA at 3:05 p.m. PT): Produced by the Storage Networking Industry Association (SNIA), the Analytics and Big Data Summit brings together IT professionals to discuss how to leverage analytics, and big data applications and systems. Seema Jethani from Basho will be presenting on Optimizing Cloud Storage to Manage Big Data, which will explore different data types and storage solutions. Attendees will gain an understanding of the needs of big data storage and the current cloud storage options available to organizations.
2014 High Performance Computing for Wall Street (September 22 in New York, NY at 2:30 p.m. ET): The 11th annual HPC networking opportunity is focused on high put-through, low latency networks, data centers and lowering the costs of operations. Our director of technical marketing, Tyler Hannan, will be presenting a Code Writing Session – Architecting for Global Scale.
Chicago Erlang (September 22 in Chicago, IL): Chicago Erlang is a one-day event focused on real world applications of Erlang. At 10:40 a.m. CT, Basho’s Reid Draper will present on Building Fault Tolerant Teams at Basho during which he will explain how Basho coordinates the activities of more than 25 Erlang programmers to build Riak. Then, at 3:20 p.m. CT, Steve Vinoski from Basho will discuss Optimizing Native Code for Erlang.
REST Fest 2014 (September 25-27 in Greenville, SC): REST Fest is an “unconference” with the objective of bringing together people interested in REST, hypermedia APIs, web service APIs and related topics to share ideas, trade stories and show examples of current work. Sean Cribbs from Basho will be the opening keynote! His keynote, HTTP: The Good Parts, will explore interesting and powerful ways to enhance interaction and efficiency when developing applications. Sean will leverage his 10 years of experience as a developer to provide insight into HTTP features and how you can tap into them more declaratively.
Surge 2014 (September 24-26 in National Harbor, MD): We will be attending and sponsoring OmniTI’s scalability and performance conference, Surge. We’d love to meet and chat, so tweet us @basho if you’re attending.
Lastly, RICON 2014 is just one month away, October 28-29. Early bird prices are good through September 22. Register here.
February 5, 2014
At the recent meetup for the New York Linux Users Group (NYLUG), Basho Technical Evangelist, Tom Santero, presented “An Introduction to Basho’s Riak.” In this talk, Tom explains how Riak addresses the challenges of concurrent data storage at scale. He discusses the various design decisions, tradeoffs made, and theories at work within Riak. He also provides guidance as to how you might deploy Riak in production and why.
In addition to introducing the basics of Riak and its key/value data model, Tom presents some of the exciting features being introduced with Riak 2.0. Riak Data Types adds counters, sets, and maps to Riak – allowing for better conflict resolution. They enable developers to spend less time thinking about the complexities of vector clocks and sibling resolution and, instead, focusing on using familiar, distributed data types to support their applications’ data access patterns.
You can watch Tom’s full talk below:
For more information about Riak and how it differs from traditional databases, check out the whitepaper, “From Relational to Riak.”
To see where Basho will be presenting next, visit the Events Page.
January 29, 2014
On Fridays, Basho hosts a Hangout to discuss various topics related to Riak and distributed systems. While Basho evangelists and engineers lead these live Hangouts, they also bring in experts from various other companies, including Kyle Kingsbury (Fatcual), Jeremiah Peschka (Brent Ozar Unlimited), and Stuart Halloway (Datomic).
If you haven’t attended a Hangout, we have recorded them all and they are available on the Basho Technologies Youtube Channel. You can also watch each below.
Data Types and Search in Riak 2.0
Featuring Mark Phillips (Director of Community, Basho), Sean Cribbs (Engineer, Basho), Brett Hazen (Engineer, Basho), and Luke Bakken (Client Services Engineer, Basho)
Bucket Types and Configuration
Featuring Tom Santero (Technical Evangelist, Basho), Joe DeVivo (Engineer, Basho), and Jordan West (Engineer, Basho)
Riak 2.0: Security and Conflict Resolution
Featuring John Daily (Technical Evangelist, Basho), Andrew Thompson (Engineer, Basho), Justin Sheehy (CTO, Basho), and Kyle Kingsbury (Factual)
Fun with Java and C Clients
Featuring Seth Thomas (Technical Evangelist, Basho), Brett Hazen (Engineer, Basho), and Brian Roach (Engineer, Basho)
Property Based Testing
Featuring Tom Santero (Technical Evangelist, Basho) and Reid Draper (Engineer, Basho)
Datomic and Riak
Featuring Hector Castro (Technical Evangelist, Basho), Dmitri Zagidulin (Professional Services, Basho), and Stuart Halloway (Datomic)
Featuring John Daily (Technical Evangelist, Basho), David Rusek (Engineer, Basho), and Jeremiah Peschka (Brent Ozar Unlimited)
A Look Back
Featuring John Daily (Technical Evangelist, Basho), Hector Castro (Technical Evangelist, Basho), Andy Gross (Chief Architect, Basho), and Mark Phillips (Director of Community, Basho)
January 6, 2014
With the launch of the Technical Preview of Riak 2.0, we also announced the addition of strong consistency to Riak. This addition fundamentally changes how Riak can be used, since all previous versions classified Riak as an eventually consistent system.
With Riak 2.0, developers now have the flexibility to choose whether buckets should be highly available or strongly consistent, based on data requirements. Consistency preferences are defined on a per bucket type basis, in the same cluster.
At RICON West 2013, Basho senior engineer, Joseph Blomstedt, gave an updated version of his “Bringing Consistency to Riak” talk. The original talk (presented at RICON West 2012) discussed the challenges, motivations, and high-level plans of bringing consistency to Riak. This updated version presents the actual implementation that has since been built and how it will function in Riak 2.0. Both talks are available below.
To start testing the strong consistency feature, you can download the Technical Preview of Riak 2.0 here.
To watch all of the sessions from RICON West 2013, visit the Basho Technologies Youtube Channel.
December 17, 2013
In addition to Riak Data Types, there were a number of other presentations about Riak 2.0 features at RICON West. With the Technical Preview of Riak 2.0, we also announced a completely redesigned Riak Search.
Riak is a straight key/value data store and all objects are stored on disk as binaries. It is content agnostic – meaning you can store any type of data as the value in Riak. To improve the usability and functionality of Riak, we offer multiple querying options including Riak Search, Secondary Indexing, and MapReduce. Riak Search is a full-text search that allows Riak developers to index the contents of stored values. While Riak Search offers much needed functionality, it had its flaws.
In Riak 2.0, Riak Search received a complete overhaul. Riak Search 2.0 leverages the Apache Solr full-text document indexing engine directly. Riak users now get the power of Solr, with the availability and scalability of Riak. This upgrade also supports the Solr client-queries API, which enables integration with existing software solutions.
Eric Redmond is one of the Basho engineers who works on Riak Search. At RICON, he presented “Riak Search 2.0,” which walks through what’s new with Riak Search and why you’d want to use it. He also provides some impressive demos that show off the power of Solr and Riak. His full talk is below.
For more information on Riak Search 2.0, check out these resources on Github.
To watch all of the sessions from RICON West 2013, visit the Basho Technologies Youtube Channel.
December 12, 2013
At RICON West this year, we announced the Technical Preview of Riak 2.0. Before the full release (which will be available early next year), we are encouraging users to download the preview and start testing some of the exciting new features.
At RICON, we had many of the engineers who worked on these new features present their work. One feature that we’re particularly excited about is the addition of Riak Data Types. Riak 2.0 builds on eventually consistent counters (added with Riak 1.4) with the addition of maps and sets. These Riak Data Types simplify application development without sacrificing Riak’s availability and partition tolerance characteristics.
In “CRDTs: An Update (or Maybe Just a PUT),” Basho engineer, Sam Elliott, presents on the work being done with Riak Data Types. Sam and a few other engineers at Basho have been integrating cutting-edge research on data types (known as CRDTs), pioneered by INRIA, to create Riak Data Types. Sam talks about the latest developments on CRDTs and walks developers through how to use them in their own applications.
In addition to Sam’s talk, we also had a talk from Jeremy Ong on “CRDTs in Production.” His talk provides real world solutions to leveraging CRDT concepts for an industrial application via case study. He also offers some suggestions on how to tackle data operations that can’t always commute. You can watch his full talk below.
For more information about Riak Data Types, check out this overview on Github.
To watch all of the sessions from RICON West 2013, visit the Basho Technologies Youtube Channel.
November 18, 2013
This series of blog posts will discuss how Riak differs from traditional relational databases. For more information about any of the points discussed, download our technical overview, “From Relational to Riak.” The previous post in the series discussed High Availability and Cost of Scale.
In order to provide high availability, which is a cornerstone of Riak’s value proposition, the database stores several copies of each key/value pair.
This availability requirement leads to a fundamental tradeoff: in order to continue to serve requests in the presence of failure, we do not force all data in the cluster to stay in sync. Riak will allow writes and reads no matter how many servers (and their stored replicas) are offline or otherwise unreachable.
(Incidentally, this lack of strong coordination has another consequence beyond high availability: Riak is a very, very fast database.)
Riak does provide both active and passive self-healing mechanisms to minimize the window of time during which two servers may have different versions of data.
The concept of eventual consistency may seem unfamiliar, but if you’ve ever implemented a cache or used DNS, those are common examples of the idea. In a large enough system, it’s effectively the default state of all data.
However, with the forthcoming release of Riak 2.0, operators will be able to designate selected pieces of data to require coordination and maintain strong consistency over high availability. Writing such data will be slower and subject to failure if too many servers are unreachable, but the overall robust architecture of Riak will still provide a fast, highly available solution.
Riak stores data using a simple key/value model, which offers developers tremendous flexibility to define access models that suit their applications. It is also content-agnostic, so developers can store arbitrary data in any convenient format.
Instead of forcing application-specific data structures to be mapped into (and out of) a relational database, they can simply be serialized and dropped directly into Riak. For records that will be frequently updated, if some of the fields are immutable and some aren’t, we recommend keeping the immutable data in one key/value pair and the rest organized into a single or multiple objects based on update patterns.
Relational databases are ingrained habits for many of us, but moving beyond them can be liberating. Further information about data modeling, including sample configurations, are available on Use Cases section of the documentation.
One tradeoff with this simpler data model is that there is no SQL or SQL-like language with which to query the data.
To achieve optimal performance, it is advisable to take advantage of the flexibility of the key/value model to define simple retrieval patterns. In other words, determine the most useful queries and write the results of those queries as the data is being processed.
Because it is not always possible to know in advance what questions will need to be asked of your data, Riak offers added functionality on top of the key/value model. Tools such as Riak Search (a distributed, full-text search engine), Secondary Indexing (ability to tag objects with queryable metadata), and MapReduce (leveraged for aggregation tasks) are available to perform ad hoc queries as needed.
For many users, the tradeoffs of moving to Riak are worthwhile due to the overall benefits; however, it can be a bit of an adjustment. To see why others have chosen to switch to Riak from both relational systems and other NoSQL databases, check out our Users Page.