In a previous post we briefly introduced Riak 2.0 data types. The addition of these distributed Data Types simplifies application development by automatically handling sibling resolution. This means developers can spend less time thinking about the complexities of vector clocks and sibling resolution and, instead, let Data Types support their applications’ data access patterns.
Understanding these data types requires a brief trip through history…
Riak 1.4 Counters
Riak 1.4 introduced counters as the first data types. Prior to 1.4 we’ve always said: “Your data is opaque to Riak,” — and it still can be — but with the addition of counters that is not longer the case. Riak knows what is stored in a counter key, and how to increment and decrement it through the counter API. It isn’t necessary to fetch, mutate, or put a counter. Instead you just incremented by 5 or decremented by 100. Vector Clocks, as discussed in the post entitled Clocks Are Bad, or, Welcome to the Wonderful World of Distributed Systems, as Riak knew how to merge concurrent writes there was never a sibling created.
Counters are very valuable, but you can not build many applications on just counters. Now, in Riak 2.0, we’ve added more data types. We believe that, with the addition of these data types you can model many applications’ data storage needs with greater simplicity, and never have to write sibling merge functions again.
What are CRDTs?
You may have heard a Basho presentation, or blog post, reference “CRDTs”. CRDT stands for (variously) Conflict-free Replicated Data Type, Convergent Replicated Data Type, Commutative Replicated Data Type, and others. The key, repeated, phrase is “Replicated Data Types”.
Replication is inherent in Riak. It is what the n-value defines. It is part of what lends to the availability and fault tolerance characteristics that Riak provides. Data Types are a common construct in computing. Sets, Bags, Lists, Registers, Maps, Counters…etc.
That leaves us to consider the “C”.
Conflict Free, or “Opaque No More”
Riak is an eventually consistent system. It leans, very much, towards the AP end of the CAP spectrum. (For more reading on the topic, the Practical Tradeoffs section of A Little Riak Book is particularly illuminating). This availability is achieved with mechanisms like sloppy quorum writes to fallback nodes. However, even without partitions and many nodes, interleaved or concurrent writes can lead to conflicts. Traditionally, Riak keeps all values and presents them to the user to resolve. The client application must have a deterministic way to resolve conflicts. It might be to pick the highest timestamp, or union all the values in a list, or something more complex. Whatever approach is chosen, it is ad-hoc, and created specifically for the data model and application at hand.
With Riak data types, there is still “conflict”. However, the resolution for that conflict is inherent and part of the data type’s design. The data types for Riak 2.0 converge automatically, at write and read time, on the server. If a client application can model its data using the data types provided, no sibling values will be seen and there is no longer a need to write ad-hoc, custom merge functions.
When modeling an applications data domain in a programming language, developers are familiar with composing state from a few primitive data types. Riak Data Types give the developer that power back and expressivity, and relieve them of the burden of design and testing deterministic merge functions. The key is that the data is no longer opaque to Riak. When the Data Types API is leveraged, Riak “knows” what type of thing is being stored and is able to perform the merge automatically.
When reading a Data Type from Riak, you will only ever see a single value. That value is still eventually consistent, but it will be as correct as it can be given the amount of entropy in the database. When the system is stable, all values will converge on a single, deterministic, correct value.
What Data Types Are Available?
Riak 2.0 includes the following Data Types:
- Counters: as in Riak 1.4
- Flags: enabled/disabled
- Sets: collections of binary values
- Registers: named Binary values with values also binary
- Maps: a collection of fields that supports the nesting of multiple Data Types
The conflict resolution, as discussed above, is intrinsic to the Data Type itself. This table provides greater detail.
|Data Type||Use Cases||Conflict Resolution Rule|
||Each actor keeps and independent count for increments and decrements. Upon merge, the pairwise maximum of any two actors will win (e.g. if one actor holds 172 and other holds 173, 173 will win upon merge)|
||Enable wins over disable|
||If an element is concurrent added and removed the add will win|
||The most chronologically recent value wins, based on timestamps|
||If a field is concurrently added, or updated and removed, the addd / update will win|
A new version of Riak, with new Data Types, allowing you to model your application in more expansive ways. Take these Data Types for a spin and be sure to let us know how you use them in your applications.
September features developer conferences, Chicago Erlang, and even an “unconference.” Take a look at where Basho will be around the U.S. this month.
Strangeloop (September 17-19 in St. Louis, MO): Strangeloop is a great opportunity to learn about emerging languages, concurrent and distributed systems, and new database technologies. Basho is attending, so tweet us @basho if you’re interested in meeting.
Analytics and Big Data Summit (September 18 in San Jose, CA at 3:05 p.m. PT): Produced by the Storage Networking Industry Association (SNIA), the Analytics and Big Data Summit brings together IT professionals to discuss how to leverage analytics, and big data applications and systems. Seema Jethani from Basho will be presenting on Optimizing Cloud Storage to Manage Big Data, which will explore different data types and storage solutions. Attendees will gain an understanding of the needs of big data storage and the current cloud storage options available to organizations.
2014 High Performance Computing for Wall Street (September 22 in New York, NY at 2:30 p.m. ET): The 11th annual HPC networking opportunity is focused on high put-through, low latency networks, data centers and lowering the costs of operations. Our director of technical marketing, Tyler Hannan, will be presenting a Code Writing Session – Architecting for Global Scale.
Chicago Erlang (September 22 in Chicago, IL): Chicago Erlang is a one-day event focused on real world applications of Erlang. At 10:40 a.m. CT, Basho’s Reid Draper will present on Building Fault Tolerant Teams at Basho during which he will explain how Basho coordinates the activities of more than 25 Erlang programmers to build Riak. Then, at 3:20 p.m. CT, Steve Vinoski from Basho will discuss Optimizing Native Code for Erlang.
REST Fest 2014 (September 25-27 in Greenville, SC): REST Fest is an “unconference” with the objective of bringing together people interested in REST, hypermedia APIs, web service APIs and related topics to share ideas, trade stories and show examples of current work. Sean Cribbs from Basho will be the opening keynote! His keynote, HTTP: The Good Parts, will explore interesting and powerful ways to enhance interaction and efficiency when developing applications. Sean will leverage his 10 years of experience as a developer to provide insight into HTTP features and how you can tap into them more declaratively.
Surge 2014 (September 24-26 in National Harbor, MD): We will be attending and sponsoring OmniTI’s scalability and performance conference, Surge. We’d love to meet and chat, so tweet us @basho if you’re attending.
Lastly, RICON 2014 is just one month away, October 28-29. Early bird prices are good through September 22. Register here.
Basho is pleased to announce the release of Riak CS 1.5, which provides additional performance enhancements and simplifies administration and development with additional admin tools, enhanced S3 compatibility and a technical preview of an architecture to support clusters with very large amounts of storage. Highlights include:
- riak-cs-admin: Consolidates admin operations into a command line tool.
- riak-cs-stanchion: Changes the Stanchion IP and port.
- riak-cs-debug: Packages log, configuration and operating system command files along with Riak command results.
- syslog: Support for standardized syslog output for log aggregation using third-party tools.
S3 API Features
- multi-object delete: Reduces request overhead by supporting multiple deletes in a single request (up to 1,000 keys per request).
- cache control headers: Method for providing caching instructions in a request header.
- PUT object – copy: Creates a copy of an object that already exists in Riak CS.
A full list of S3 API compatibility can be found on the Basho docs site here.
Increased Scalability (Enterprise Feature)
Partly due to limitations with distributed Erlang, prior to 1.5 scalability, Riak CS was limited to several petabytes. CS 1.5 introduces a technical preview of an architecture that allows multiple Riak clusters to reside under a single CS namespace, thereby significantly increasing the amount of storage possible in a cluster. A production-ready version is planned for later this year, with multi-data center support to follow.
Garbage Collection Improvements
In Riak CS, deleted and updated objects are not removed immediately. Instead, a reference is written to a special bucket and later removed by the garbage collection process at regular intervals. CS 1.5 includes several garbage collection enhancements that will benefit customers with a high rate of object deletion or updates.
- concurrent garbage collection worker processes: Speed up the rate of garbage collection with the addition of multiple workers.
- flexible enforcement of leeway interval: In previous versions, updated and deleted objects are reaped only after they reach a predefined time-based leeway interval, which was set when an object was marked for deletion. In CS 1.5 the leeway interval is managed by the garbage collection daemon and can be changed to remove objects sooner, for example, in emergency situations where maximum storage capacity is reached.
Other Notable Enhancements
- faster bucket listings: Optimizations in the OTP xmerl library enables faster bucket listings, in particular for large buckets.
- setting ACLs upon PUT object: Ability to set ACLs via header at PUT object creation is now fully functional.
Riak CS 1.5 is available at: http://docs.basho.com/riakcs/latest/riakcs-downloads/. A full list of changes is available in the release notes. Watch the blog for a detailed discussion of the multi-cluster work.
Distributed cloud storage software adds additional Amazon S3 compatibility, performance improvements, simplified admin and increased scalability
CAMBRIDGE, Mass. – August 5, 2014 – Basho, the creator and developer of Riak, the industry leading distributed NoSQL database, today introduced Riak CS 1.5 and Riak CS 1.5 Enterprise, Basho’s distributed object storage software. Riak CS (Cloud Storage) is open source software built on top of Riak, used to build public or private clouds, or, as reliable storage to power applications and services. Riak CS 1.5 delivers new features that improve operation, performance and scalability. Basho continues to offer enterprise-class features in Riak CS Enterprise, which includes multi-datacenter replication, world class 24 by 7 support and flexible pricing model.
Companies dealing with large amounts of unstructured data like videos, images and documents are adopting cloud object storage so that data is highly available through a seamlessly scalable architecture. Businesses in industries such as broadcasting and telecommunications are relying on stability, integration functionality and performance of Riak CS to efficiently store, organize and access data while making it simple to manage.
“We offer our customers affordable and scalable cloud storage solutions built on Basho’s Riak CS,” said Makoto Oya, vice director of IDC Frontier. “The enhanced Amazon S3 compatibility and ability to scale well into the multi-petabyte level in Riak CS 1.5 will help us better support the rapid growth we are seeing in our storage business.”
I-NET Corp, a data processing service headquartered in Japan, uses Riak CS for its cloud service called Dream Cloud® and is looking to achieve further cost efficiency thanks to the increased scalability capabilities in Riak CS 1.5.
“Cloud-based object storage is ideal for storing our customer’s growing business-critical data, and we have relied on the excellent performance, cost efficiency and high reliability of Riak CS for the I-NET Dream Cloud®,” said Tsutomu Taguchi, senior managing director, business group of I-NET Corp. “Riak CS already provides us with high availability and now that Riak CS is further optimized to scale, we believe that Riak CS 1.5 delivered by Basho will drive even higher adoption of Dream Cloud®.”
New features enhance performance for object storage to store increasing amounts of data worldwide
Basho delivers new functions in Riak CS that include:
- Additional Amazon S3 compatibility: Expanded storage API compatibility with S3 includes features such as multi-object delete, put object copy, and cache control headers for more flexible integration with content delivery networks (CDNs).
- Performance improvement in garbage collection process: Delivered especially for customers with high rate of object updates and deletes, Riak CS now more quickly reaps objects flagged for garbage collection.
- New, simplified administrative features: New and consolidated admin features make organizational tasks easier for activities such as cluster management, monitoring and troubleshooting.
- Multi-cluster support: Technology preview for increased scalability of Riak CS Enterprise by allowing multiple Riak clusters to reside under a single CS namespace, thereby expanding the maximum capacity of a single cluster.
“Providing the strongest key value solution and object store means responding to customer needs and demands attentively,” said Dave McCrory, CTO of Basho. “With Riak CS 1.5 Enterprise, new features are delivered as requested by our customers. We are committed to make it easier to consume cutting edge versions of Riak and will continue to do this by executing a more iterative approach in how we release Riak.”
Availability and Pricing
Riak CS 1.5 is available immediately for Debian, Ubuntu, FreeBSD, OS X, Red Hat Enterprise Linux, Fedora, SmartOS and Solaris. To view the latest technical documentation or to download Riak CS, visit docs.basho.com/riakcs/latest/.
Basho delivers customized packages for its commercial software, Riak Enterprise and Riak Enterprise Plus, with health checks, as well as options for project-based Professional Services engagements. Full pricing details of Basho commercial software are at http://basho.com/riak-enterprise/#pricing. To request a trial license of Riak CS Enterprise, prospective inquiries can request a Riak CS Tech Talk at http://info.basho.com/SignUpRiakTechTalk.html.
- Basho Website (http://basho.com)
- Basho Blog (http://basho.com/blog/)
- Riak (http://basho.com/riak/)
- Riak CS (http://basho.com/riak-cloud-storage/)
- Riak CS doc (docs.basho.com/riakcs/latest/)
- Additional Resources (http://basho.com/resources/)
- Twitter: @Basho (https://twitter.com/basho)
- LinkedIn (https://www.linkedin.com/company/basho-technologies-inc)
About Basho Technologies
Basho is a distributed systems company dedicated to making software that is highly available, fault-tolerant and easy-to-operate at scale. Basho’s distributed database, Riak, and Basho’s cloud storage software, Riak CS, are used by fast growing Web businesses and by one third of the Fortune 50 to power their critical Web, mobile and social applications and their public and private cloud platforms.
Riak and Riak CS are available open source. Riak Enterprise and Riak CS Enterprise offer enhanced multi-datacenter replication and 24×7 Basho support. For more information, visit basho.com. Basho is headquartered in Cambridge, Massachusetts.
September 30, 2013
While the biggest event of October is Basho’s distributed systems conference, RICON West, we will still be traveling the world to attend many other events this month. Here’s a look at where you can find us during the weeks leading up to RICON.
Monktoberfest: Basho’s Director of Marketing, Tyler Hannan, will be speaking at Monktoberfest on “Medieval Art, Collective Intelligence, and Language Abuse – The Ethos of Distributed Systems.” Monktoberfest will take place in Portland, ME from Oct. 3-4.
Erlang Factory Lite: Basho will have speakers at both the Chicago event (Oct. 4th) and the Berlin event (Oct. 16th). Check out talks from Chris Meiklejohn and Steve Vinoski to learn more about Riak, Erlang, and distributed systems.
CloudConnect Chicago: Basho is a sponsor and exhibitor of CloudConnect Chicago, taking place Oct. 21-23. Basho engineer, John Burwell, will also be speaking about building private clouds with Apache CloudStack and Riak CS.
O’Reilly Strata: Basho will be exhibiting and speaking at the upcoming O’Reilly Strata conference in New York from Oct. 28-30. Stop by our booth and find out why we will all be using distributed systems in the future.
June 17, 2013
RICON East, Basho’s distributed systems conference, took place last month in New York. Hundreds of developers and academics gathered for two days to learn how distributed systems are being used in production and where they’ll be in the future.
Over the next few weeks, we will be posting the videos of the talks on the RICON East Archive. These videos are open to anyone and feature speakers from various distributed systems backgrounds. Slides for all of the talks are also available in the Archive.
The first six videos are already available on the site. These videos are:
- “Automatically Scalable Computation” by Dr. Margo Seltzer, Herchel Smith Professor of CS at Harvard SEAS
- “Why is my Cache so Dumb? Smarter Caching with Pequod” by Neha Narula, PhD Candidate at MIT
- “Bloom: Big Systems from Small Programs” by Neil Conway, PhD Candidate at UC Berkeley
- “Large Scale Data Service as a Service” by Brian Akins, Senior Principal Architect at Turner Broadcasting System
- “Optimizing LevelDB for Performance and Scale” by Matthew Von-Maszewski, Software Engineer at Basho Technologies
- “Just Open a Socket – Connecting Applications to Distributed Systems” by Sean Cribbs, Software Engineer at Basho Technologies
Basho is also hosting another distributed systems conference, RICON West, in San Francisco on October 29-30th. We already have some great speakers lined up, including Jeff Dean (Google Fellow), Kate Matsudaira (Founder and CTO of Pop Forms), Peter Bailis (PhD Candidate at UC Berkeley), Justin Sheehy (CTO at Basho Technologies), Jeff Hodges (Distributed Systems Engineer at Twitter), and Diego Ongaro (PhD Candidate at Stanford University). Early bird tickets are on sale now.
Be on the lookout for more videos coming soon and we’ll see you at RICON West!
June 3, 2013
This summer, Basho will be traveling all over the world to sponsor and speak at various events. Keep an eye on our Events Page to see where we’ll be next. If you’re going to be at any of these events, we’d love to meet with you. Simply contact us and we can schedule a time for you to meet with a Basho team member.
Below are some highlights of where we will be:
Erlang User Conference: This conference brings together companies and developers using Erlang from all over the world. Join us in Stockholm, Sweden from June 10-14 to hear Bryan Fink, Basho’s Principal Software Engineer, speak on Riak Pipe and load distribution.
QCON: Basho will be at QCon in New York, NY from June 12-14, where we will be discussing how to select the right database technology and some existing and emerging data storage challenges. Basho’s Technical Evangelist, Tom Santero, will also be presenting “Riak, Latency, and Distributed Systems” on the first day.
GigaOm Structure SF: Basho is a proud sponsor of GigaOm Structure SF. This event takes place June 19-20 in San Francisco, CA. Basho’s Chief Architect, Andy Gross, will discuss the resurgence in interest in both theoretical and applied distributed systems and provide practical advice for dealing with systems in a newly distributed world.
For a full list of where we will be in June, visit our Events Page.
May 30, 2013
Basho’s distributed systems conference, RICON East, was chock full of amazing talks from academics and professionals from all industries. Luckily, for everyone who couldn’t attend, all of the talks were recorded and will be available on the RICON site soon.
In the meantime, we have posted the slides from the closing keynote, presented by Basho Chief Architect, Andy Gross. His talk, entitled, “Lessons Learned and Questions Raised (from building distributed systems),” goes over his experience building distributed systems and how the space is changing. Check out his slides below and be on the lookout for the complete video, which will be posted soon.
Tickets are now on sale for RICON West, which will take place October 29-30 in San Francisco. You can get early bird pricing now through August 29th.
May 1, 2013
On May 15 & 16, immediately following Basho’s Distributed Systems conference RICON East, Basho will be hosting Riak Training in New York. The cost to attend is $400. In addition, a 50% discount is extended to those holding a conference pass to RICON East.
If you’re interested in attending, tickets can be purchased at the Riak Training in New York page. Seats are limited. If you have any questions, you can reach out to email@example.com.
Riak training is a two day, hands-on, in-depth look at Riak. It is designed for engineers, developers, and operations staff to learn how to run, operate, and build apps with Riak. During this training, participants will learn how to:
- Set up a small Riak cluster
- Query the cluster using basic Key/Value, Links, 2i, and Map/Reduce
- Understand deployment and performance considerations
- Evaluate application Access Patterns
- Consider data modeling implications in a distributed system
This training will also go over a number of topics, including:
- Introduction to Riak
- Basic Querying
- Riak Under-the-Hood
- Deployment Considerations
- Performance Tuning
- Application Development
- Data Modeling
- Distributed Systems Engineering
In addition we are offering Riak training in San Francisco from May 20-21. Tickets for the San Francisco training can be purchased on the Riak Training in San Francisco page.
May 1, 2013
This post looks at five commonly asked questions about Riak CS – simple, available, open source storage built on top of Riak. For more information, please review our full documentation, or sign up for an intro to Riak CS webcast on Friday, May 10.
What is the relationship between Riak and Riak CS?
Riak CS is built on top of Riak, exposing higher-level storage functions including large object support, an S3-compatible API, multi-tenancy, and per-user storage and access statistics. Riak itself provides the replication, availability, fault-tolerance, and underlying storage functions for the Riak CS implementation. Riak and Riak CS should both be installed on every node in your cluster. While Riak and Riak CS could be run on separate virtual or physical nodes, running them on the same machine minimizes intra-cluster bandwidth usage and is the recommended approach. As with Riak, we advise a minimum 5-node cluster.
When objects are uploaded to Riak CS, the object is broken up into smaller chunks which are then streamed, stored, and replicated in the underlying cluster. A manifest is maintained for each object, that points to which blocks comprise the object, and is used to retrieve all blocks and present them to the client on read. In addition to running Riak and Riak CS on each node, Stanchion, a request serializer, must be installed on at least one node in the cluster. This ensures that global entities, such as users and buckets, are unique in the system.
What use cases does Riak CS support that Riak doesn’t?
Riak CS has several features that are not provided in the standalone Riak database. One of the most obvious differences is in the size of objects supported. Riak CS exposes large object support, and includes multi-part upload so you can upload objects as a series of parts. This allows you to upload single objects to the system into the terabyte range. In Riak, the data model is simply key/value; in Riak CS, the key/value model provides the underlying structure for higher-level storage semantics – users, buckets and objects. The Riak CS interface is an S3-compatible HTTP API, allowing you to use existing S3 libraries and tools. In contrast, Riak exposes an HTTP and protobufs API and offers many language-specific clients. Unlike Riak, Riak CS is multi-tenant, with the concept of “users” and per-user reporting on storage and access. This makes it a fit for both private cloud scenarios, with multiple internal users, or as a foundation for a public cloud storage offering.
How does multi-tenancy, authentication and reporting work?
Riak CS exposes an interface for user creation, disablement and credential management. Riak CS can be set so that only administrators can create new users. Administrators also have special privileges including being able to retrieve a list of all users in the system and query the user account information of any user. Once issued credentials, users are able to authenticate, create buckets, upload and download files, retrieve account information, obtain new credentials, or disable their account through the API. Riak CS supports the standard S3 authentication scheme, with support for header and query string authorization.
Riak CS exposes storage, usage and network statistics that support use cases like accounting, subscription, billing or multi-group utilization for public or private clouds. Riak CS will report information on how much storage a user is consuming and the network operations related to access. This data is exposed via an HTTP interface and can be queried on the default timespan “now” or as a range from start time through end time. Access statistics are reported as bytes in and bytes out for both object and bucket operations. Reporting of this information can be scheduled for a set interval or manually triggered.
What’s the difference between Riak CS and Riak CS Enterprise?
Riak CS Enterprise provides multi-datacenter replication on top of Riak CS. For multi-datacenter replication in Riak CS, global information for users, bucket information and manifests are streamed in real-time from a primary implementation to a secondary site so global state is maintained across locations. Objects can then be replicated in either full sync or real-time sync mode. The secondary site will replicate the object as in normal operations. Additional datacenters can be added in order to create availability zones or provide additional data redundancy and locality. Riak CS Enterprise can also be configured for bi-directional replication. Riak CS Enterprise also comes with 24/7, enterprise-level support. More information and pricing can be found here, and full technical information is available on our docs portal. Ready to get started? Sign up for a developer trial of Riak CS Enterprise.
What are your plans for integration of Riak CS with open source compute solutions?
Riak CS provides highly available, distributed storage, making it a natural fit for usage alongside compute solutions. We have partnered with Citrix to collaborate on the integration of Apache CloudStack and Riak CS to create a complete cloud software offering that combines compute and storage in an integrated platform. For more information on our partnership with CloudStack, check out this blog post with the latest update. API and authentication support for OpenStack is also in progress.