November 10, 2014
Many data needs are better served by data stores that are optimized for maximum availability and scalability — rather than optimized for consistency. For certain use cases, there are elements to the data that require strong consistency. With Riak 2.0, in addition to eventual consistency, there is now a way to enforce strong consistency when needed.
NOTE: Riak’s strong consistency feature is currently an open-source-only feature and is not yet commercially supported.
Behavioral Changes with Strong Consistency
Strongly consistent operations in Riak function much like eventually consistent operations at the application level. The core difference lies in the types of errors Riak will report to the client.
Each request to update an object (except for the initial creation) must include a context value reflecting the last time the application read it. This is the same behavior that Riak clients have always followed with version vectors and strong consistency also mandates its use. Similarly, reading data from a strongly consistent Riak bucket functions exactly like eventually consistent reads.
If that value is not provided for an update operation to an existing object, Riak will reject it. This is because the database assumes that you have not seen the current value and may not know what you’re doing.
Similarly, if that context value is out of date, Riak will also reject update operations. The client must re-read the latest value and supply an update based on that new value, with the new context.
If Riak cannot contact a majority of the servers responsible for the key, the request will fail. Ordinarily, Riak is happy to accept all operations in the interest of high availability and never dropping a write – even in the extreme case of only one server surviving data center outages.
Strong consistency also eliminates object siblings, as it is effectively impossible for the cluster to disagree on the value of an object.
When considering consistency models in an application, it is easy for the logic to quickly become daunting. This is especially true when designing a workflow that leverages both eventually and strongly consistent models. It is, therefore, easiest to begin with a simple use case.
Consider the workflow involved in storing and updating username and password data. In the case of a password update, it is necessary that — at any given time — there be exactly ONE result for a user’s password. Relatedly, it is important to ensure that an update of this value is fully atomic or user experience is substantially degraded. It would be possible to leverage Riak for all the eventually consistent elements of the application and leverage strong consistency for the username and password.
To see how eventual and strong consistency can be combined to solve business problems, let’s take a not-so-hypothetical example from the energy industry.
Imagine you’re collecting massive amounts of geological data for analysis. Each batch of data must be processed by a single instance of your application. Since this processing can take hours, days, or even weeks to complete, it’s expensive if two applications handle the same batch.
Let’s walk through the sequence of events.
- Batch of data arrives for processing.
- The batch is stored in a large object store (like, Riak CS) under a batch ID.
- The batch ID is added to a pending job list in Riak and stored as a set (one of the new Riak Data Types).
This is a classic example of eventual consistency and an illustration of the value of the new Riak Data Types introduced with Riak 2.0. Storing a new batch ID in your database should never fail, even if servers are offline. If multiple applications are adding batch IDs to the pending list at the same time, it’s perfectly reasonable for those lists to temporarily diverge, as long as they can be trivially merged later.
Let’s continue to see where strong consistency comes into play.
- A compute node becomes available to process the data.
- The compute node retrieves the pending job list and picks a batch ID.
- The compute node attempts to create a lock for that batch ID.
This is where strong consistency is required. This lock object should be created in a bucket that is managed by the new strong consistency subsystem in Riak 2.0. If someone else also grabs that batch ID and tries to create another lock object, Riak’s strong consistency logic will reject this second attempt. This compute node will just start over by grabbing a new ID.
To detect crashed jobs, the lock object should be created with basic job data, such as which compute node owns the processing job and what time it was started.
- The compute node asks Riak to add the batch ID to a different set, a running job list.
- The compute node asks Riak to remove the batch ID from the pending list.
- The job runs.
- When completed, the compute node asks Riak to add the batch ID to a completed job list.
- Riak is asked to remove the batch ID from the running list.
- The compute node deletes the lock object (or updates it to reflect the completion of the processing job).
Tradeoffs When Using Strong Consistency
- Blind updates will be rejected, so the client must read the existing value before supplying a new one (except in the case of entirely new keys).
- Write requests may be slightly slower due to coordination overhead.
- If a majority of the servers responsible for a piece of data are unavailable, write requests will fail. Read operations may fail depending on the freshness of the data that is still accessible.
- Secondary indexes (2i) are not yet supported.
- Multi-datacenter replication in Riak Enterprise is not yet supported.
- Using Strong Consistency (for developers)
- Managing Strong Consistency (for operators)
- Strong Consistency (theory & concepts)
Strong Consistency is now available with Riak 2.0. Download Riak 2.0 on our Docs Page.
By: Peter Coppola
We had the opportunity to stop by DATAVERSITY’S NoSQL Now! conference in San Jose last week. I was very impressed with the level of energy and the wide-ranging selection of sessions offered. According to Tony Shaw, the CEO of DATAVERSITY, the organizer of NoSQL Now, registrations were up 15 percent from 2013.
The exhibition hall was packed and lively as attendees jostled between booths. DATAVERSITY did an outstanding job keeping the show floor tightly packed with exhibitors. The industry was well represented by Cloudera (saw “Data is the new bacon” t-shirts), MarkLogic, MongoDB, Oracle and EnterpriseDB – all present as major sponsors. Between conversations, I was able to nab a nifty versatile screw-driver disguised as a pen from DataStax.
NoSQL Now sessions do rely heavily on sponsors, but with such a wide selection of tracks there’s bound to be a topic of interest at any given time slot. I had a choice of the following concurrent sessions at 4:15 p.m. on Wednesday:
- Internet of Things with MongoDB – MongoDB
- Out with MapReduce, In with Spark – DataStax
- Case Studies in Search and Semantics – MarkLogic
- Just the Right Weather for our Company: How We Chose Our Data Stores – The Weather Company
- NoSQL on ACID – EnterpriseDB
I attended The Weather Company’s session – not only was it the only non-vendor presenter, but the company is also a customer and big fan of Riak. The Weather Company manages five data centers that in production handle 25,000 requests per second and distribute 60 GB of data to each data center every 10 minutes. Surya Kangeyan Sivakumar took us through the journey of how The Weather Company selected its data store solutions and how it overcame the mindset of having to use its existing relational database solution just because the company had invested so much in it. Riak was selected, along with other NoSQL solutions, due to the speed and ease at which it could be stood up.
In 2015 Basho looks forward to being a more active participant in NoSQL Now.
Distributed cloud storage software adds additional Amazon S3 compatibility, performance improvements, simplified admin and increased scalability
CAMBRIDGE, Mass. – August 5, 2014 – Basho, the creator and developer of Riak, the industry leading distributed NoSQL database, today introduced Riak CS 1.5 and Riak CS 1.5 Enterprise, Basho’s distributed object storage software. Riak CS (Cloud Storage) is open source software built on top of Riak, used to build public or private clouds, or, as reliable storage to power applications and services. Riak CS 1.5 delivers new features that improve operation, performance and scalability. Basho continues to offer enterprise-class features in Riak CS Enterprise, which includes multi-datacenter replication, world class 24 by 7 support and flexible pricing model.
Companies dealing with large amounts of unstructured data like videos, images and documents are adopting cloud object storage so that data is highly available through a seamlessly scalable architecture. Businesses in industries such as broadcasting and telecommunications are relying on stability, integration functionality and performance of Riak CS to efficiently store, organize and access data while making it simple to manage.
“We offer our customers affordable and scalable cloud storage solutions built on Basho’s Riak CS,” said Makoto Oya, vice director of IDC Frontier. “The enhanced Amazon S3 compatibility and ability to scale well into the multi-petabyte level in Riak CS 1.5 will help us better support the rapid growth we are seeing in our storage business.”
I-NET Corp, a data processing service headquartered in Japan, uses Riak CS for its cloud service called Dream Cloud® and is looking to achieve further cost efficiency thanks to the increased scalability capabilities in Riak CS 1.5.
“Cloud-based object storage is ideal for storing our customer’s growing business-critical data, and we have relied on the excellent performance, cost efficiency and high reliability of Riak CS for the I-NET Dream Cloud®,” said Tsutomu Taguchi, senior managing director, business group of I-NET Corp. “Riak CS already provides us with high availability and now that Riak CS is further optimized to scale, we believe that Riak CS 1.5 delivered by Basho will drive even higher adoption of Dream Cloud®.”
New features enhance performance for object storage to store increasing amounts of data worldwide
Basho delivers new functions in Riak CS that include:
- Additional Amazon S3 compatibility: Expanded storage API compatibility with S3 includes features such as multi-object delete, put object copy, and cache control headers for more flexible integration with content delivery networks (CDNs).
- Performance improvement in garbage collection process: Delivered especially for customers with high rate of object updates and deletes, Riak CS now more quickly reaps objects flagged for garbage collection.
- New, simplified administrative features: New and consolidated admin features make organizational tasks easier for activities such as cluster management, monitoring and troubleshooting.
- Multi-cluster support: Technology preview for increased scalability of Riak CS Enterprise by allowing multiple Riak clusters to reside under a single CS namespace, thereby expanding the maximum capacity of a single cluster.
“Providing the strongest key value solution and object store means responding to customer needs and demands attentively,” said Dave McCrory, CTO of Basho. “With Riak CS 1.5 Enterprise, new features are delivered as requested by our customers. We are committed to make it easier to consume cutting edge versions of Riak and will continue to do this by executing a more iterative approach in how we release Riak.”
Availability and Pricing
Riak CS 1.5 is available immediately for Debian, Ubuntu, FreeBSD, OS X, Red Hat Enterprise Linux, Fedora, SmartOS and Solaris. To view the latest technical documentation or to download Riak CS, visit docs.basho.com/riakcs/latest/.
Basho delivers customized packages for its commercial software, Riak Enterprise and Riak Enterprise Plus, with health checks, as well as options for project-based Professional Services engagements. Full pricing details of Basho commercial software are at http://basho.com/riak-enterprise/#pricing. To request a trial license of Riak CS Enterprise, prospective inquiries can request a Riak CS Tech Talk at http://info.basho.com/SignUpRiakTechTalk.html.
- Basho Website (http://basho.com)
- Basho Blog (http://basho.com/blog/)
- Riak (http://basho.com/riak/)
- Riak CS (http://basho.com/riak-cloud-storage/)
- Riak CS doc (docs.basho.com/riakcs/latest/)
- Additional Resources (http://basho.com/resources/)
- Twitter: @Basho (https://twitter.com/basho)
- LinkedIn (https://www.linkedin.com/company/basho-technologies-inc)
About Basho Technologies
Basho is a distributed systems company dedicated to making software that is highly available, fault-tolerant and easy-to-operate at scale. Basho’s distributed database, Riak, and Basho’s cloud storage software, Riak CS, are used by fast growing Web businesses and by one third of the Fortune 50 to power their critical Web, mobile and social applications and their public and private cloud platforms.
Riak and Riak CS are available open source. Riak Enterprise and Riak CS Enterprise offer enhanced multi-datacenter replication and 24×7 Basho support. For more information, visit basho.com. Basho is headquartered in Cambridge, Massachusetts.
April 28, 2014
On Monday, May 5th, Basho is co-hosting an open source happy hour and networking meetup. Partnering with GoGrid, we will bring together some of the greatest minds in open source to discuss current trends and where open source will be in the future. Drinks and appetizers will be provided. This meetup will take place from 6pm-8pm on May 5th at the 111 Minna Gallery in San Francisco. Space is limited, so be sure and RSVP quickly.
This meetup is also in conjunction with the Open Business Conference. Open Source has become ubiquitous in the enterprise and in the business layer, as more and more organizations are reaping its considerable benefits, including speed, efficiency and cost savings. Open Business Conference 2014 brings together the people who are building and deploying the latest in enabling technologies and solutions and teaches businesses how to put their data to use. Basho will be available to answer any questions about Riak and open source at the GoGrid booth. Open Business Conference takes place May 5-6 at the Palace Hotel in San Francisco.
To learn more about Basho’s partnership with GoGrid (and other partnerships), visit our Partners Page.
April 16, 2014
The world of gaming can be unpredictable. It can be hard to judge if a game is going to be the next Angry Birds and experience exponential, global growth. Riak is designed to help gaming platforms handle this uncertainty with ease. Its focus on high availability means that all data remains accessibility, even during node failure. Its flexible data model and redundant, fault-tolerant design easily allows gaming platforms to store any type of data needed. Riak is also built for operational simplicity at scale, so Riak will seamlessly grow with data and popularity. Finally, the option for multi-datacenter replication means that gamers all over the world will get the same low-latency experience across multiple devices.
Top Use Cases for Riak in Gaming
- Player Data: Riak provides low-latency, highly available data storage for key player data, including user and profile information, game performance, statistics and rankings, and more. Riak also provides many different tools for querying and indexing this data, such as Riak Search, Secondary Indexing, and MapReduce.
- Session Storage: Riak is frequently used to store and serve session data with predictable low-latency – necessary for game play. Riak imposes no restrictions on the type of content stored (since all objects are stored on disk as binaries), so session data can be encoded in many ways and can evolve without administrative changes to schemas.
- Social Information: Riak provides flexible, robust storage for social data such as social graph information, player profiles and relationships, and social authentication tokens.
- Global Data Locality: When gaming, players require a low-latency experience, regardless of where they’re physically located. Otherwise, interrupted or slow game play can lead to poor user experience and possible user abandonment. Riak Enterprise’s multi-datacenter capabilities allow game data to be physically close to players and serve them data no matter where they happen to be.
Riak in Production
Riak is already in production by many top gaming platforms. Here’s a look at a few that have switched to Riak.
Rovio is the creator of the popular mobile game, Angry Birds. Since user growth can be hard to predict, they needed an infrastructure that could support unexpected viral growth without failing or causing downtime. They selected Riak due to its ease-of-scale and fault tolerance. Riak now powers their new cartoon series, Angry Birds Toons, and new mobile games. Learn more about why they moved to Riak in this case study and video from GDC.
Hibernum is a creator and developer of unique gaming experiences that combine the latest in social gaming, top quality visuals and animations, and cutting edge design. They switched from a relational database to Riak due to the high availability, ability to scale to peak loads, and predictable operational cost. Riak is used to store user game information for one of their most popular social games. Check out the complete case study, Hibernum Selects Riak for User Data Storage.
Kiip is a platform for building rewards and achievements into your games. Kiip replaced MongoDB with Riak in order to achieve low read/write latencies and horizontal scalability. Kiip uses Riak for storing and serving session and device data. Learn more from the video on scaling Riak to 25MM Ops/Day.
Riot Games is the creator of League of Legends and faced some challenges with supporting millions of concurrent players at any given moment. They switched to Riak from MySQL for their next generation stats system, which tracks gameplay statistics and stores terabytes of data that gets aggregated and presented to players in near real-time. More information on how they use Riak and why they selected it can be found here.
Data Modeling in Riak
Riak has a “schemaless” design. Objects are comprised of key/value pairs, which are stored in flat namespaces called buckets. Here are some common approaches to structuring gaming data with Riak’s key/value design:
|Player Data||Login, Email, UUID||Player Attributes (often stored as a JSON document); Player Rewards and Stats|
|Social Data||Login, Email, UUID||Player Profiles, Social Graph Information, Facebook/Twitter Tokens|
|Session Information||User/Session ID||Session Data|
|Image or Video Content||Content Name, ID or Integer||.JPG .PNG, .GIF or other image format; .MOV, .MPG, .MP4 or other video file format|
To learn more about how gaming platforms can use Riak for their data needs, check out the complete overview, “Gaming on Riak: A Technical Introduction.” To get started with Riak, Contact Us or download it now.
January 27, 2014
On the official Basho docs, we compare Riak to multiple other databases. We are currently working on updating these comparisons, but, in the meantime, we wanted to provide a more up-to-date comparison for one of the more common questions we’re asked: How does Riak compare to Cassandra?
Cassandra looks the most like Riak out of any other widely-deployed data storage technology in existence. Cassandra and Riak have architectural roots in Amazon’s Dynamo, the system Amazon engineered to handle their highly available shopping cart service. Both Riak and Cassandra are masterless, highly available stores that persist replicas and handle failure scenarios through concepts such as hinted handoff and read-repair. However, there are certain key differences between the two that should be considered when evaluating them.
Amazon’s Dynamo utilized a Key/Value data model. Early in Cassandra’s development, a decision was made to diverge from keys and values toward a wide-row data model (similar to Google’s BigTable). This means Cassandra is a Key/Key/Value store, which includes the concept of Column Families that contain columns. With this model, Cassandra is able to handle high write volumes, typically by appending new data to a row. This also allows Cassandra to perform very efficient range queries, with the tradeoff being a more rigid data model since rows are “fixed” and non-sequential read operations often require several disk seeks.
On the other hand, Riak is a straight Key/Value store. We believe this offers the most flexibility for data storage. Riak’s schemaless design has zero restrictions on data type, so an object can be a JSON document at one moment and a JPEG at the next.
Like Cassandra, Riak also excels at high write volumes. Range queries can be a little more costly, though still achievable through Secondary Indexes. In addition, there are a number of data modeling tips and tricks for Riak that make it easy to expose access to data in ways that sometimes aren’t as obvious at first glance. Below are a few examples:
In Riak, multi-datacenter replication is achieved by connecting independent clusters, each of which own their own hash ring. Operators have the ability to manage each cluster and select all or part of the data to replicate across a WAN. Multi-datacenter replication in Riak features two primary modes of operation: full sync and real-time. Data transmitted between clusters can be encrypted via OpenSSL out-of-the-box. Riak also allows for per-bucket replication for more granular control.
Cassandra achieves replication across WANs by splitting the hash ring across two or more clusters, which requires operators to manually define a NetworkTopologyStrategy, Replication Factor, a Replication Placement Strategy, and a Consistency Level for both local and cross data center requests.
Conflict Resolution and Object Versioning
Cassandra uses wall clock timestamps to establish ordering. The resolution strategy in this case is Last Write Wins (LWW), which means that data may be overwritten when there is write contention. The odds of data loss are magnified by (inevitable) server clock drift. More details on this can be found in the blog, “Clocks are Hard, or, Welcome to the Wonderful World of Distributed Systems.”
Riak uses a data structure called vector clocks to track the causal ordering of updates. This per-object ancestry allows Riak to identify and isolate conflicts without using system clocks.
In the event of a concurrent update to a single key, or a network partition that leaves application servers writing to Riak on both sides of the split, Riak can be configured to keep all writes and expose them to the next reader of that key. In this case, choosing the right value happens at the application level, allowing developers to either apply business logic or some common function (e.g. merge union of values) to resolve the conflict. From there, that value can be written back to Riak for its key. This ensures that Riak never loses writes.
Riak Data Types, first introduced in Riak 1.4 and expanded in the upcoming Riak 2.0, are designed to converge automatically. This means Riak will transparently manage the conflict resolution logic for concurrent writes to objects.
In the event of server failures and network problems, Riak is designed to always accept read and write requests, even if the servers that are ordinarily responsible for that data are unavailable.
Cassandra will allow writes to (optionally) be stored on alternative servers, but will not allow that data to be retrieved. Only after the cluster is repaired and those writes are handed off to an appropriate replica server (with the potential data loss that timestamp-based conflict resolution implies, as discussed earlier) will the data that was written be available to readers.
Imagine a user working with a shopping cart when the application is unable to connect to the primary replicas. The user can re-add missing items to the cart but will never actually see the items show up in the cart (unless the application performs its own caching, which introduces more layers of complexity and points of failure).
When handling missing or divergent/stale data, Riak and Cassandra have many similarities. Both employ a passive mechanism where read operations trigger the repair of inconsistent replicas (known as read-repair). Both also use Active Anti-Entropy, which builds a modified Merkle tree to track changes or new inserts on a per hash-ring-partition basis. Since the hash rings contain overlapping keys, the trees are compared and any divergent or missing data is automatically repaired in the background. This can be incredibly effective at combating problems such as bitrot, since Active Anti-Entropy does not need to wait for a read operation.
The key difference in implementation is that Cassandra uses short-lived, in-memory hash trees that are built per Column Family and generated as snapshots during major data compactions. Riak’s trees are on-disk and persistent. Persistent trees are safer and more conducive to ensuring data integrity across much larger datasets (e.g. 1 billion keys could easily cost 8-16GB of RAM in Cassandra versus 8-16GB of disk in Riak).
Both Cassandra and Riak are eventually consistent, scalable databases that have strengths for specific use cases. Each has hundreds of thousands of hours of engineering invested and the commercial backing and support offered by their respective companies, Datastax and Basho. At Basho, we have labored to make Riak very robust and easy to operate at both large and small scale. For more information on how Riak is being used, visit our Riak Users page. For a look at what’s to come, download the Technical Preview of Riak 2.0.
December 30, 2013
2013 was a huge year for Basho Technologies and before we dive into 2014, we thought we’d take a moment to reflect on how far we’ve come.
2013 was the year of the Riak User. We love hearing about all the amazing ways companies across various industries are using Riak. This year, we were able to share dozens of exciting case studies. These include:
- Synacor’s TV Everywhere platform
- Enstratius (acquired by Dell)
- Best Buy
- Alert Logic
- Viggle (through OmniTI)
- Turner Broadcasting
- Hosted Graphite
- Gilt Groupe
- Praekelt Foundation
- National Health Service
- City Maps
- The Weather Company
For even more Riak Users, check out the Users Page.
We released Riak 1.3, Riak 1.4, and the Technical Preview of Riak 2.0 this year. These releases added such features as Active Anti-Entropy, revamped Riak Control, queryability improvements, Riak Data Types, and much more. Be on the lookout for the general release of Riak 2.0 early next year.
This year, we expanded RICON, Basho’s distributed systems conference, to both RICON East and RICON West. These were both sold out conferences that featured speakers from bitly, Comcast, Google, Netflix, Salesforce, The Weather Company, Turner Broadcasting, Twitter, and many more.
We drastically increased the number of Basho partners in 2013. For a full list of partners, check out the Partnerships Page. Some key ones to note include Tokyo Electron Device, SoftLayer, and Seagate.
Our amazing community team hosted over 200 meetups around the world this year. On top of that, they also attended dozens of industry events to spread the word about Basho. Keep an eye on the Events Page to see where we’ll be in 2014.
2013 was a busy year but, with some exciting announcements coming, we look forward to an even busier 2014. Happy New Year!
December 18, 2013
Downtime, planned or unplanned, is no longer an option. It can have a dramatic impact on revenue and lead to negative customer experiences and attrition. Luckily, distributed NoSQL databases (such as Basho Riak) are designed to provide high availability, even during network partition or server failure. This means there will never be an excuse for downtime again.
To help demonstrate the cost of downtime and how Riak can help, we have put together an infographic, “Down With Downtime.” Zoom in by clicking the image below.
October 28, 2013
The technology community is extremely agile and fast-paced. It can turn on a dime to solve business problems as they arise. However, with this agility comes budding terminology that can often provide false categorizations. This can lead to confusion, especially when companies evaluate new technologies based on a surface understanding of these terms. The world of data is full of these terms, including the notorious “NoSQL” and “big data.”
As described in a previous post, NoSQL is a misleading term. This term represents a response to changing business priorities that require more flexible, resilient architectures (as opposed to the traditional, rigid systems that often happen to use SQL). However, within the NoSQL space, there are dozens of players that can be as different from one another as they are from any of the various SQL-speaking systems.
Big data is another term that, while fairly self-explanatory, has been overused to the point of dilution. One reason why NoSQL databases have become necessary is because of their ability to easily scale to keep up with data growth. Simply storing a lot of data isn’t the solution though. Some data is more critical than others (and should be accessible no matter what) and some data needs to be analyzed to provide business insights. When digging into a business, big data is too vague a term to describe both of these use cases.
As these terms (to highlight a few) are used, it can lead to industry confusion. One area of confusion that we have experienced relates to Basho’s own distributed database, Riak, and the distributed processing system, Hadoop.
While these two systems are actually complementary, we are often asked “How is Riak different from Hadoop?”
To help explain this, it’s important to start with a basic understanding of both systems. Riak is a distributed database that is built for high availability, fault tolerance, and scalability. It is best used to store large amounts of critical data that applications and users need to constantly be able to access. Riak is built by Basho Technologies and can be used as an alternative to or in conjunction with relational databases (such as MySQL) or to other “NoSQL” databases (such as MongoDB or Cassandra).
Hadoop is a framework that allows for the distributed parallel processing of large data sets across clusters of computers. It was originally based on the “MapReduce” system, which was invented by Google. Hadoop consists of two core parts: the underlying Hadoop Distributed File System (HDFS), which ensures stored data is always available to be analyzed, and MapReduce, which allows for scalable computation by dividing and running queries over multiple machines. Hadoop provides an inexpensive, scalable solution for bulk data processing and is mostly used as part of an overarching analytics strategy, not for primary “hot” data storage.
One easy way to distinguish between the two is to look at some of the common use cases.
Riak Use Cases
Riak can be used by any application that needs to always have access to large amounts of critical data. Riak uses a key/value data model and is data-type agnostic, so operators can store any type of content in Riak. Due to the key/value model, certain industry use cases fit easily into Riak. These include:
- Gaming – storing player data, session data, etc
- Retail – underpinning shopping carts, product inventories, etc
- Mobile – social authentication, text and multimedia storage, global data locality, etc
- Advertising – serving ad content, session storage, mobile experiences, etc
- Healthcare – prescription or patient records, patient IDs, health data that must always be available across a network of providers, etc
For a full list of use cases, check out our Users Page.
Hadoop Use Cases
Hadoop is designed for situations where you need to store unmodeled data and run computationally intensive analytics over that data. The original use cases of both MapReduce and Hadoop were to produce indexes for distributed search engines at Google and Yahoo respectively. Any industry that needs to do large scale analytics to better improve their business can use Hadoop. Some common examples include finance (build models to do accurate portfolio evaluations and risk analysis) and eCommerce (analyze shopping behavior to deliver product recommendations or better search results).
Riak and Hadoop are based on many of the same tenets, making their usage complementary for some companies. Many companies that utilize Riak today have created scripts, or processes, to pull data from Riak and push into other solutions (like Hadoop) for the purpose of historical archiving or future analysis. Recognizing this trend, Basho is exploring the creation of additional tools to simplify this process.
If you are interested in our thinking on these data export capabilities, please contact us.
Every tool has its value. Hadoop excels at being used by a relatively small subset of the business to answer big questions. Riak excels at being used by a very large number of users and powering critical data for businesses.
October 22, 2013
Today, Seagate has announced the availability of their Kinetic Open Storage platform, which simplifies data management, improves performance and scalability, all while lowering expenses. This fundamentally new architecture reduces costs by allowing applications to communicate directly with the storage system, eliminating the acquisition, deployment, and support costs of hyperscale storage infrastructures.
Basho has partnered with Seagate to help them develop this platform to provide interoperability and testing with Riak. Now, with the release of this platform, we want to make it easier for developers to test the Kinetic Open Storage platform with Riak. We have just released an alpha version of our eKinetic driver, which enables an Erlang-based high-performance socket connection to the drive. We have also released software to improve Riak backend compatibility by mapping a Riak backend to the drive library. Both are available for download https://github.com/basho-labs/riak_kinetic.
Not only does deploying Riak on this platform drastically simplify the management of data through a straightforward socket-based network interface, this simplification also increases I/O efficiency by removing bottlenecks and optimizing cluster management, data replication, migration, and active multi-datacenter performance. Additionally, it is expected that users will realize up to a 50% decrease in the Total Cost of Operations through simplified operations alone. Users can also maximize storage density through reduced power and cooling costs and build out cloud datacenters for even more savings.
Seagate Principal Technologist, James Hughes, will be speaking about the Kinetic Open Storage platform and Riak at RICON, Basho’s distributed systems conference. His talk, “Device Based Innovation to Enable Scale-Out Storage” will take place on October 29th at 12pm in Track Two. Seagate is also a sponsor of RICON.