Tag Archives: Datomic

Using Riak as a Storage Backend for Datomic

February 12, 2014

Datomic is a distributed database system that supports queries, joins, and ACID transactions. Through its pluggable persistence layer, you can wire Datomic up to a horizontally scalable key/value store that strives for operational simplicity, like Riak.

(Want to hear more about Datomic? Check out the Riak Community Hangout with Stuart Halloway of Cognitect.)

Below, we’ll explore the specifics around getting Riak enabled as a storage service for Datomic. We will also provide you with a Vagrant project that automates many of these steps, so you can have a local development environment with a Riak-backed Datomic running within minutes.

ZooKeeper

Datomic stores indexes and a log of known transactions in its storage backend. You can think of the indexes as sorted sets of datoms, and the data log as a recording of all transaction data in historic order.

Both of these pieces of data are stored as trees with blocks that are roughly 64K in size. The blocks themselves are immutable and cater very well to the strengths of eventual consistency. Other bits of data, like the root pointers (for the trees) for indexes and the data log, require the ability to compare-and-swap (CAS). They need to be stored in a strongly consistent backend.

Enter ZooKeeper.

We won’t go through the details of standing up a ZooKeeper ensemble here, but once you have, make sure you have a list of IP:PORT pairs for each instance (at least three recommended for production usage).

Note: Strong consistency is coming in Riak 2.0 and will make ZooKeeper unnecessary for this use case.

Riak

Riak is a distributed key/value store with an emphasis on high availability. To learn more, download the free eBook, A Little Riak Book.

To get started with Riak, head over to the Quick Start Guide and walk through the setup of a five-node cluster.

Transactor

In Datomic, the Transactor component is responsible for coordinating write requests and is a critical single point of failure. Think of the Transactor the same way you think about a relational database. You need one, but you may also want another ready to go if the primary fails.

The Transactor needs to know a few things about Riak:

  • riak-host
  • riak-port
  • riak-interface (valid options are protobuf or http – use protobuf)
  • riak-bucket (can just set this to datomic)

Note: The Transactor passes the Riak host and port to the riak-java-client. You’ll want to round-robin requests against all of the nodes in your cluster evenly (usually accomplished with a load balancer). If you setup a load balancer to front your Riak cluster, provide its host and port to the Transactor via riak-host and riak-port.

The Forbidden Dance

At this point it’s assumed that you have a ZooKeeper ensemble, Transactor instance, and Riak cluster ready to go. Now, fetch your list of ZooKeeper nodes and supply it (comma delimited) as the payload of an HTTP PUT request to Riak like so:

Now all of the components can talk to each other!

DatomicDiagram

Vagrant

For those who aren’t familiar, Vagrant simplifies the process of creating and configuring virtual development environments. By combining it with a few Chef cookbooks for Datomic, ZooKeeper, and Riak, we can automate all of the steps described above (for a local development environment).

Simply clone the vagrant-datomic-riak repository and execute the following:

Within a few minutes, you should have a functioning local development environment. To test, install the Datomic Peer Library and walk through the tutorial.

Hector Castro

Hangouts with Basho

January 29, 2014

On Fridays, Basho hosts a Hangout to discuss various topics related to Riak and distributed systems. While Basho evangelists and engineers lead these live Hangouts, they also bring in experts from various other companies, including Kyle Kingsbury (Fatcual), Jeremiah Peschka (Brent Ozar Unlimited), and Stuart Halloway (Datomic).

If you haven’t attended a Hangout, we have recorded them all and they are available on the Basho Technologies Youtube Channel. You can also watch each below.

Data Types and Search in Riak 2.0

Featuring Mark Phillips (Director of Community, Basho), Sean Cribbs (Engineer, Basho), Brett Hazen (Engineer, Basho), and Luke Bakken (Client Services Engineer, Basho)

Bucket Types and Configuration

Featuring Tom Santero (Technical Evangelist, Basho), Joe DeVivo (Engineer, Basho), and Jordan West (Engineer, Basho)

Riak 2.0: Security and Conflict Resolution

Featuring John Daily (Technical Evangelist, Basho), Andrew Thompson (Engineer, Basho), Justin Sheehy (CTO, Basho), and Kyle Kingsbury (Factual)

Fun with Java and C Clients

Featuring Seth Thomas (Technical Evangelist, Basho), Brett Hazen (Engineer, Basho), and Brian Roach (Engineer, Basho)

Property Based Testing

Featuring Tom Santero (Technical Evangelist, Basho) and Reid Draper (Engineer, Basho)

Datomic and Riak

Featuring Hector Castro (Technical Evangelist, Basho), Dmitri Zagidulin (Professional Services, Basho), and Stuart Halloway (Datomic)

CorrugatedIron

Featuring John Daily (Technical Evangelist, Basho), David Rusek (Engineer, Basho), and Jeremiah Peschka (Brent Ozar Unlimited)

A Look Back

Featuring John Daily (Technical Evangelist, Basho), Hector Castro (Technical Evangelist, Basho), Andy Gross (Chief Architect, Basho), and Mark Phillips (Director of Community, Basho)

Hangouts take place on Fridays at 11am PT/2pm ET. If you have any topics you’d like to see featured, let us know on the Riak Mailing List.

Basho

Understanding Riak's Configurable Behaviors Epilogue

May 22, 2013

Basho recently held its second distributed systems conference, RICON East in New York City. Months of preparation led to two days of concentrated learning, with community members from academia and industry sharing where we’ve been and where we’re going.

By design, many of the presentations had little direct relationship to Riak: RICON is a marketplace for ideas, not for product. However, two of the speakers tackled topics I discussed recently in my blog series on the subtleties of Riak configuration.

This is a follow-up to that series to examine those talks. I won’t repeat earlier content in any significant detail.

Rich Hickey, Using Datomic with Riak

Datomic is a very different take on databases, more akin to a version control system than a traditional RDBMS. In Datomic, records (“facts”) are never changed, but rather can be replaced as needed.

The notion of immutable facts leads to a conceptually simple distributed model that allows for transactions: a view into the database is simply a checkpoint of the facts. It’s always possible that a client may be reading an old checkpoint, but the facts at that checkpoint will be consistent regardless of what further updates have been applied.

Riak is one of several backends that can be used with Datomic.

How Datomic queries Riak

Because Datomic keeps a record of all keys in the system, and because the values for those keys never change, reads can be expedited by setting R=1.

However, as you’ll recall, R=1 has an important complication: if the first vnode to respond does not have a copy of that key (perhaps there’s a sloppy quorum in play due to a node failure) the request will “successfully” complete with a notfound message.

This default behavior can be changed by setting notfound_ok=false so that the coordinating node will await an actual value before reporting it back to the client, and in fact this is how Datomic operates.

Kyle Kingsbury, Call Me Maybe: Carly Rae Jepsen and the Perils of Network Partitions

Kyle conducted extensive testing of various distributed databases in the face of network partition. Specifically, he wanted to see whether writes were successful (and properly retained) during and after the partition.

His testing of Riak with allow_mult=false (the default) revealed 91% of writes were lost after the partition healed.

Riak is, however, the only database that retained 100% of writes during a partition, but only when allow_mult was set to true in order to allow sibling resolution on the client side after the partition.

Without allow_mult=true, there is no way (currently) for Riak to resolve conflicting writes other than to accept the last value written.

Important: Riak would also do a perfectly good job of preserving all writes under the Datomic model of creating immutable key/value pairs. It may seem like all databases should handle that scenario properly, but in fact some will throw away all writes on one side of the partition.

Kyle emphasizes what I mentioned in part 1 of this series: if you can’t create immutable objects, and don’t want to handle conflict resolution via the client, CRDTs will allow for automatic resolution in the future, so long as you can make your data fit that model.

Kyle has expanded his talk into a blog series.

RICON

Basho will be hosting two more RICON conferences this year, in San Francisco and London. As was true in New York City in May and San Francisco last fall, the talks will be streamed live over the Internet and would be well worth your time.

However, speaking from personal experience, the talks are just a portion of the overall value offered by RICON. It is difficult to convey the atmosphere during and between sessions, but even the afterparty was replete with technical discussions.

If you’ve not experienced it, you can browse the #riconeast tag at Twitter for a feel for the reactions of those present (and those not) to the RICON experience, and please consider joining us next time.

RICON East videos should be available soon; the album of RICON 2012 videos is recommended.

John R. Daily