February 12, 2014
Datomic is a distributed database system that supports queries, joins, and ACID transactions. Through its pluggable persistence layer, you can wire Datomic up to a horizontally scalable key/value store that strives for operational simplicity, like Riak.
Below, we’ll explore the specifics around getting Riak enabled as a storage service for Datomic. We will also provide you with a Vagrant project that automates many of these steps, so you can have a local development environment with a Riak-backed Datomic running within minutes.
Datomic stores indexes and a log of known transactions in its storage backend. You can think of the indexes as sorted sets of datoms, and the data log as a recording of all transaction data in historic order.
Both of these pieces of data are stored as trees with blocks that are roughly
64K in size. The blocks themselves are immutable and cater very well to the strengths of eventual consistency. Other bits of data, like the root pointers (for the trees) for indexes and the data log, require the ability to compare-and-swap (CAS). They need to be stored in a strongly consistent backend.
We won’t go through the details of standing up a ZooKeeper ensemble here, but once you have, make sure you have a list of
IP:PORT pairs for each instance (at least three recommended for production usage).
Note: Strong consistency is coming in Riak 2.0 and will make ZooKeeper unnecessary for this use case.
Riak is a distributed key/value store with an emphasis on high availability. To learn more, download the free eBook, A Little Riak Book.
To get started with Riak, head over to the Quick Start Guide and walk through the setup of a five-node cluster.
In Datomic, the Transactor component is responsible for coordinating write requests and is a critical single point of failure. Think of the Transactor the same way you think about a relational database. You need one, but you may also want another ready to go if the primary fails.
The Transactor needs to know a few things about Riak:
riak-interface(valid options are
riak-bucket(can just set this to
Note: The Transactor passes the Riak host and port to the riak-java-client. You’ll want to round-robin requests against all of the nodes in your cluster evenly (usually accomplished with a load balancer). If you setup a load balancer to front your Riak cluster, provide its host and port to the Transactor via
The Forbidden Dance
At this point it’s assumed that you have a ZooKeeper ensemble, Transactor instance, and Riak cluster ready to go. Now, fetch your list of ZooKeeper nodes and supply it (comma delimited) as the payload of an HTTP
PUT request to Riak like so:
Now all of the components can talk to each other!
For those who aren’t familiar, Vagrant simplifies the process of creating and configuring virtual development environments. By combining it with a few Chef cookbooks for Datomic, ZooKeeper, and Riak, we can automate all of the steps described above (for a local development environment).
Simply clone the vagrant-datomic-riak repository and execute the following:
January 29, 2014
On Fridays, Basho hosts a Hangout to discuss various topics related to Riak and distributed systems. While Basho evangelists and engineers lead these live Hangouts, they also bring in experts from various other companies, including Kyle Kingsbury (Fatcual), Jeremiah Peschka (Brent Ozar Unlimited), and Stuart Halloway (Datomic).
If you haven’t attended a Hangout, we have recorded them all and they are available on the Basho Technologies Youtube Channel. You can also watch each below.
Data Types and Search in Riak 2.0
Featuring Mark Phillips (Director of Community, Basho), Sean Cribbs (Engineer, Basho), Brett Hazen (Engineer, Basho), and Luke Bakken (Client Services Engineer, Basho)
Bucket Types and Configuration
Featuring Tom Santero (Technical Evangelist, Basho), Joe DeVivo (Engineer, Basho), and Jordan West (Engineer, Basho)
Riak 2.0: Security and Conflict Resolution
Featuring John Daily (Technical Evangelist, Basho), Andrew Thompson (Engineer, Basho), Justin Sheehy (CTO, Basho), and Kyle Kingsbury (Factual)
Fun with Java and C Clients
Featuring Seth Thomas (Technical Evangelist, Basho), Brett Hazen (Engineer, Basho), and Brian Roach (Engineer, Basho)
Property Based Testing
Featuring Tom Santero (Technical Evangelist, Basho) and Reid Draper (Engineer, Basho)
Datomic and Riak
Featuring Hector Castro (Technical Evangelist, Basho), Dmitri Zagidulin (Professional Services, Basho), and Stuart Halloway (Datomic)
Featuring John Daily (Technical Evangelist, Basho), David Rusek (Engineer, Basho), and Jeremiah Peschka (Brent Ozar Unlimited)
A Look Back
Featuring John Daily (Technical Evangelist, Basho), Hector Castro (Technical Evangelist, Basho), Andy Gross (Chief Architect, Basho), and Mark Phillips (Director of Community, Basho)
May 22, 2013
Basho recently held its second distributed systems conference, RICON East in New York City. Months of preparation led to two days of concentrated learning, with community members from academia and industry sharing where we’ve been and where we’re going.
By design, many of the presentations had little direct relationship to Riak: RICON is a marketplace for ideas, not for product. However, two of the speakers tackled topics I discussed recently in my blog series on the subtleties of Riak configuration.
This is a follow-up to that series to examine those talks. I won’t repeat earlier content in any significant detail.
Rich Hickey, Using Datomic with Riak
Datomic is a very different take on databases, more akin to a version control system than a traditional RDBMS. In Datomic, records (“facts”) are never changed, but rather can be replaced as needed.
The notion of immutable facts leads to a conceptually simple distributed model that allows for transactions: a view into the database is simply a checkpoint of the facts. It’s always possible that a client may be reading an old checkpoint, but the facts at that checkpoint will be consistent regardless of what further updates have been applied.
Riak is one of several backends that can be used with Datomic.
How Datomic queries Riak
Because Datomic keeps a record of all keys in the system, and because the values for those keys never change, reads can be expedited by setting
However, as you’ll recall,
R=1 has an important complication: if the first vnode to respond does not have a copy of that key (perhaps there’s a sloppy quorum in play due to a node failure) the request will “successfully” complete with a
This default behavior can be changed by setting
notfound_ok=false so that the coordinating node will await an actual value before reporting it back to the client, and in fact this is how Datomic operates.
Kyle Kingsbury, Call Me Maybe: Carly Rae Jepsen and the Perils of Network Partitions
Kyle conducted extensive testing of various distributed databases in the face of network partition. Specifically, he wanted to see whether writes were successful (and properly retained) during and after the partition.
His testing of Riak with
allow_mult=false (the default) revealed 91% of writes were lost after the partition healed.
Riak is, however, the only database that retained 100% of writes during a partition, but only when
allow_mult was set to
true in order to allow sibling resolution on the client side after the partition.
allow_mult=true, there is no way (currently) for Riak to resolve conflicting writes other than to accept the last value written.
Important: Riak would also do a perfectly good job of preserving all writes under the Datomic model of creating immutable key/value pairs. It may seem like all databases should handle that scenario properly, but in fact some will throw away all writes on one side of the partition.
Kyle emphasizes what I mentioned in part 1 of this series: if you can’t create immutable objects, and don’t want to handle conflict resolution via the client, CRDTs will allow for automatic resolution in the future, so long as you can make your data fit that model.
Kyle has expanded his talk into a blog series.
Basho will be hosting two more RICON conferences this year, in San Francisco and London. As was true in New York City in May and San Francisco last fall, the talks will be streamed live over the Internet and would be well worth your time.
However, speaking from personal experience, the talks are just a portion of the overall value offered by RICON. It is difficult to convey the atmosphere during and between sessions, but even the afterparty was replete with technical discussions.
If you’ve not experienced it, you can browse the #riconeast tag at Twitter for a feel for the reactions of those present (and those not) to the RICON experience, and please consider joining us next time.
RICON East videos should be available soon; the album of RICON 2012 videos is recommended.