Riak is an open source, distributed database. Riak is architected for:
- Availability: Riak replicates and retrieves data intelligently so it is available for read and write operations, even in failure conditions;
- Fault-Tolerance: You can lose access to many nodes due to network partition or hardware failure without losing data;
- Operational Simplicity: Add new machines to your Riak cluster easily without incurring a larger operational burden – the same ops tasks apply to small clusters as large clusters;
- Scalability: Riak automatically distributes data around the cluster and yields a near-linear performance increase as you add capacity.
Docs Downloads Intro Webcast: 5/17
Developing on Riak
Data Model
Riak uses a simple key/value model for object storage. Objects in Riak consist of a unique key and a value, stored in a flat namespace called a bucket. You can store anything you want in Riak: text, images, JSON/XML/HTML documents, user and session data, backups, log files, and more.
APIs and Client Libraries
Riak provides a straight-forward, REST-ful API as well as a protocol buffers interface. There are many client libraries for Riak, including Java, Python, Perl, Erlang, Ruby, PHP, .NET, and many others.
Search and Accessing Data
Riak has several additional features for querying data, including:
- MapReduce: Perform query and aggregation tasks like filtering documents by tags, counting words in documents, and extracting links to related data.
- Riak Search: Use Riak’s distributed, full-text search engine with a robust query language.
- Secondary Indexes: Tag objects stored in Riak with additional values and query by exact match or range.
ARCHITECTURE
What is a Riak Node?
Each node in a Riak cluster is the same – containing a complete, independent copy of the Riak package. There is no “master.” This uniformity provides the basis for Riak’s fault-tolerance and scalability. Riak is written in Erlang, a language designed for massively scalable systems.
Data Distribution
Data is distributed across nodes using consistent hashing. Consistent hashing ensures data is evenly distributed around the cluster and new nodes can be added automatically, with minimal reshuffling.
Replication
Riak automatically replicates data in the cluster (default three replicas per object). You can lose access to many nodes in the cluster due to failure conditions and still maintain read and write availability.
When Nodes Fail
If a node fails or is partitioned from the rest of the cluster, a neighboring node will take over its storage operations. When the failed node returns, the updates received by the neighboring node are handed back to it. This ensures availability for writes or updates, and happens automatically.
OPERATIONS
Scaling Out
When you add nodes, data is rebalanced automatically with no downtime. Developers don’t need to deal with the underlying complexity of what data is where. Any node can accept and route requests.
Stats and DTrace Support
Riak uses Folsom, an Erlang-based system that collects and reports real-time metrics, to provide stats via HTTP request. Additionally, Riak supports DTrace for analysis of running systems.
COMMUNITY
Basho takes pride in developing, releasing and supporting open source projects, and nurturing and building communities around them. Get connected with us on Twitter, LinkedIn, IRC or Facebook. Make sure to sign up for the Riak users mailing list. You can also join one of our user groups, with locations in San Francisco, New York, London, Boston, Portland, Amsterdam, Brazil, Munich, and more. Riak is one of our most popular pieces of code, but it’s by no means the only one. If you want to browse all of Basho’s code, visit our Github account.

