March 30, 2015
This is the first post in a series of blog posts, entitled Riak Customer Stories, where we will look at common use cases for Riak and their applicability in specific verticals. Our first customer stories will focus on how Riak is helping Gaming companies achieve massive scalability.
Online gaming continues to grow in popularity, whether for huge gaming communities like Riot Games’ League of Legends or gaming sites like bet365, one of the world’s leading online gambling groups. This growth is forcing changes to existing infrastructure in order to keep up with demand and innovation. Traditional relational databases can’t meet the requirements for massive scalability, speed, and fault tolerance
Innovation is critical to retain long-term customer loyalty and is changing the way gamers play online. These changes include the move away from single bets on an event to in-game betting on an ever-increasing range of metrics. The advent of regional gaming competitions, like the League of Legends World Championship with an annual grand prize of $1 million, show just how far gaming has come.
Gaming on Riak
Companies who build games or betting sites use Riak in three key ways:
- Player Data – Riak provides low-latency, highly available data storage for key player data, including user and profile information, game performance, statistics and rankings, and more. Riak also provides many different tools for querying and indexing this data, such as a full-text search engine and secondary indexing.
- Session Storage – Riak is used to store and serve session data with predictable low-latency, which is necessary for game play. Riak imposes no restrictions on the type of content stored (since all objects are stored on disk as binaries), so session data can be encoded in many ways and can evolve without administrative changes to schemas.
- Global Data Locality – While gaming, players require a low-latency experience, regardless of their physical location. Interrupted or slow game play can lead to poor user experience and player abandonment. Riak Enterprise’s multi-datacenter capabilities allow game data to be physically close to players and for fast response times regardless of player location.
- Social Information – Riak is built for very fast data storage. Due to its inherent design and Riak’s simple key/value data model, Riak is ideal for storing and serving social content such as social graph information, player profiles, player relationships, social authentication accounts, and other types of social gaming data.
By using Riak, companies have achieved global availability, massive scalability, while still maintaining operational simplicity These benefits are derived from the core architectural decisions made in the design of Riak.
By design Riak is masterless. Each node in a Riak cluster is the same, containing a complete and independent copy of the Riak package. There is no “master” or coordinating node. This uniformity provides the basis for Riak’s fault-tolerance and scalability. When this is coupled with an even distribution of data around the cluster via consistent hashing, there is a significant decrease in risky “hot spots” in the database while lowering the operational burden associated with manually sharding data. In addition, new nodes can easily be added with automatic, minimal redistribution of data.
This distribution of data in a masterless system is supplemented with a process of “hinted handoff”. Hinted handoff lets Riak cleanly handle node failure. If a node fails, a neighboring node takes over its storage operations. When the failed node returns, any updates received by the neighboring node are handed back to it. This ensures availability for writes and updates and happens automatically.These are discussed in greater detail in a blog post entitled Why Riak Just Works.
Modeling Gaming Applications in Riak
The table below illustrates key/value mappings for common application types. Remember that values in Riak are opaque and stored on disk as binaries – JSON or XML documents, images, text, etc. Riak has a “schemaless” design. Objects are comprised of key/value pairs, which are stored in flat namespaces called “buckets.” The way data is organized in Riak should take into account the unique needs of the application, including access patterns such as read/write distribution, latency differences between various operations, use of Riak features (including MapReduce, Search, Secondary Indexes), and more.
Here are some common approaches to structuring gaming data with Riak’s key/value design:
|Player Data||Login, email, UUID||Player Attributes (often stored as a JSON document); Player Rewards and Stats|
|Social Data||Login, email, UUID||Player Profiles, Social Graph Information, Facebook/Twitter Tokens|
|Session Information||User/Session ID||Session Data|
|Image or Video Content||Content Name, ID, or Integer||.JPG, .PNG, .GIF or other image format; .MOV, .MPG, .MP4 or other video file format|
Gaming Customer Stories
In a recent webinar, Dan Macklin, Head of Research and Development at bet365, provided an overview of their decision making process in choosing Riak. As one of the world’s leading online gambling groups, with over 18 million customers in two hundred countries, bet365 has a unique perspective on making an informed, strategic decision when designing an always available application architecture.
In this webinar, Dan discussed:
- bet365’s journey to Riak
- The evaluation and technical challenges being addressed
- The triumphs of migrating to Riak
- Advice for anyone evaluating their database requirements
bet365 was faced with a massive scale issue. Their existing SQL, relational database solution was simply unable to keep up with the demand placed on it by their infrastructure without needed to incur the complexity and cost of sharding. The lack of scalability was causing undue stress on their infrastructured leading to a loss of availability. Of particular interest, for those sharing a similar decision making process, is that Dan discusses not only their search for a solution but their decision making process that ultimately identified Riak
The session is available for replay here.
At RICON 2014, Basho’s distributed systems conference for developers, Michal Ptaszek gave a session entitled Let’s Chat About Chat. This session provided detailed insight into how Riot Games built their League of Legends chat system with Riak to handle 70 million players.
In League of Legends, just as in any competitive team game, communication is essential to success. Therefore, when building Chat for the game we had to make sure that the new service would be absolutely rock solid in every respect. This includes not only guaranteed message delivery and consistent presence propagation across the system, but also maintenance of the created social network graph.
In this talk I would like to present how we achieved linear scalability for Chat, improved its overall fault tolerance, and got ready for the new features we wanted to ship. I will also discuss in detail why we migrated our data from MySQL to Riak and how we used CRDTs to deal with conflicting object updates.
As is thematic in gaming use cases, database scalability was a primary consideration and was an architectural consideration from the start. Riot Games started their application modeling with MySQL –a relational database– but hit multiple performance, reliability, and scalability issues. As an example, it simply was not possible to update the database schema fast enough to track changes made in code.
In addition, Riot Games leverages the multi-datacenter capabilities off Riak Enterprise to export persistent data to a secondary Riak cluster. Costly ETL queries, like social graph queries, are run on the secondary cluster without interrupting the primary cluster. This design pattern is often referred to as a “Secondary Analytics Cluster”.
Some statistics that highlight the immense scale that Riot deals with:
- 67 million unique players every month (not counting other services using chat)
- 27 million daily players
- 7.5 million concurrent players
- 1 billion events routed per server, per day, only using 20-30 percent of available CPU and RAM
- 11K messages per second
- A few hundred chat servers are deployed around the world. Managed by 3 people
- 99% uptime
To learn more about Riak in the Gaming and Gambling industry, there are several useful resources to begin your research and design your deployment.
- Riak Solution for Gaming – This Solution Brief discusses using Riak for a variety of gaming and gambling use cases.
- Riak Tech Talk – Our experienced team can help develop your use case, answer questions, and make sure you are successful at every step from development to production. We can arrange either in-person or virtual meetings, depending on availability and location.
- Why bet365 chose Riak – Get a better understanding of how to make informed strategic decisions directly from someone who has taken the journey. Dan Macklin, Head of Research and Development at bet365 will show you how. His story about choosing Riak will captivate anyone that needs to ensure their data is always available.
January 22, 2015
In speaking with Riak users, both open source and commercial, we are frequently told that Riak’s key/value model is more flexible and faster to develop against than a traditional relational database. Even though Riak is well suited for many applications, there are inevitable tradeoffs in terms of query options and data types that are available. With a key/value model, there is no concept of columns or rows, therefore Riak does not have join operations. Riak can be queried either directly via HTTP, the protocol buffers API and through various client libraries. However, there is no SQL or SQL-like language that is currently available.
Riak’s key/value data model does not preclude queryability. There are several powerful querying options including:
- Riak Search: Integration with Apache Solr provides full-text search and support for Solr’s client query APIs.
- Secondary Indexes: Secondary Indexes (2i) give developers the ability to tag an object stored in Riak with one or more query values. Indexes can be either integers or strings, and can be queried by either exact matches or ranges of values.
- MapReduce: Developers can leverage Riak MapReduce for tasks like filtering documents by tag, counting words in documents, and extracting links to related data.
For more information, check out the Riak documentation on Querying Data.
The table below illustrates key/value mappings for common application types. Remember that values in Riak are opaque and stored on disk as binaries – JSON or XML documents, images, text, etc. The way data is organized in Riak should take into account the unique needs of the application, including access patterns such as read/write distribution, latency differences between various operations, use of Riak features (including MapReduce, Search, Secondary Indexes), and more.
|Session||User/Session ID||Session Data|
|Advertising||Campaign ID||Ad Content|
|Sensor||Date, Date/Time||Sensor Updates|
|User Data||Login, eMail, UUID||User Attributes|
|Content||Title, Integer||Text, JSON/XML/HTML Document, Images, etc.|
Consider, for example, one of the canonical use cases for Riak…storing user and session data. In a relational database, the “users” table is well known and, basically, provides a unique identifier per user, and then a series of identifying information about that user as individual columns such as:
- First name
- Last name
- Counter of Site Visits
- Paid Account Identifier
This data can then be used to correlate or count, paid users, common interests, etc. via a series of SQL queries against the row/column structure of the users table.
Riak, in contrast, provides flexibility in how this data can be modeled based upon the application use case. It may be desirable to create a Users bucket, with the UserName (or Unique Identifier) as the key and a JSON object storing all user attributes as the value. Or, as we describe in Data Modeling with Riak Data Types, leverage the power of Riak Data Types by creating a map type for each user storing:
- first and last name strings in the register type,
- interests as a set,
- a counter for visits,
- and a flag for paid account identifier.
One of the best ways to enable application interaction with objects (a key/value pair) in Riak is to provide structured bucket and key names for the objects. This approach often involves wrapping information about the object in the object’s location data itself.
For example, appending a timestamp, UUID, or Geographical coordinate, to a key’s name allows for fine grained queryability via simple lookup to locate and retrieve a specific set of information. Leveraging the same naming mechanism as created for users (UniqueID as the key) enables, in a separate sessions bucket, storing the UUID append with a timestamp as the key and the session data (in binary format) as the object. In this way, using the same UUID, I am able to obtain both user and session data stored in different buckets and in different formats.
For additional information, and more complex considerations such as modeling relationship and advanced social applications, see the Riak documentation on use cases and data modeling.
Resolving Data Conflicts
In any system that replicates data, conflicts can arise – e.g., if two clients update the same object at the exact same time or if not all updates have yet reached hardware that is experiencing lag. Riak is “eventually consistent” – while data is always available, not all replicas may have the most recent update at the same time, causing brief periods (generally on the order of milliseconds) of inconsistency while all state changes are synchronized.
However, Riak does provide features to detect and help resolve the statistically small number of incidents when data conflicts occur. When a read request is performed, Riak looks up all replicas for that object. By default, Riak will return the most updated version, determined by looking at the object’s vector clock. Vector clocks are metadata attached to each replica when it is created. They are extended each time a replica is updated to keep track of versions. Clients can also be allowed to resolve conflicts themselves.
Further, when an outdated object is discovered as part of a read request, Riak will automatically update the out-of-sync replica to make it consistent. Read Repair, a self-healing property of the database, will even update a replica that returns a “not_found” in the event that a node loses it due to physical failure.
Riak also features “Active Anti-Entropy,” which is an automatic self-healing property that runs in the background. Rather than waiting for a read request to trigger a replica repair (as with Read Repair), Active Anti-Entropy constantly uses a hash tree exchange to compare replicas of objects and automatically repairs or updates any that are divergent, missing, or corrupt. This can be beneficial for large clusters storing “stale” data.
More information on vector clocks, dotted version vectors, and conflict resolution can be found in the online documentation in the section regarding Causal Context.
Multi-site replication is quickly becoming critical for many of today’s platforms and applications. Not only does replication across multiple clusters provide geographic data locality – the ability to serve global traffic at low-latencies – it can also be an integral part of a disaster recovery or backup strategy. Other teams may use multi-site replication to maintain secondary data stores, both for failover as well as for performing intensive computation without disrupting production load. Multi-site replication is included in Basho’s commercial extension to Riak, Riak Enterprise, which also includes 24/7 support.
Multi-site replication in Riak works differently than the typical approach seen in the relational world, multi-master replication. In Riak’s multi-datacenter replication, one cluster acts as a “primary cluster.” The primary cluster handles replication request from one or more “secondary clusters” (generally located in datacenters in other regions or countries). If the datacenter with the primary cluster goes down, a secondary cluster can take over as the primary cluster. In this sense, Riak’s multi-datacenter capabilities are “masterless.”
In multi-datacenter replication, there are two primary modes of operation: full sync and real-time. In full sync mode, a complete synchronization occurs between primary and secondary cluster(s). In real-time mode, continual, incremental synchronization occurs – replication is triggered by new updates. Full sync is performed upon initial connection of a secondary cluster, and then periodically (by default, every 6 hours). Full sync is also triggered if the TCP connection between primary and secondary clusters is severed and then recovered.
Data transfer is unidirectional (primary->secondary). However, bidirectional synchronization can be achieved by configuring a pair of connections between clusters.
Full documentation for multi-datacenter replication in Riak Enterprise is available in the online documentation.
Modeling data in any non-relational solution requires a different way of thinking about the data itself. Rather than an assumption that all data cleanly fits into a structure of rows and columns, the data domain can be overlayed on the core Key/Value store (Riak) in a variety of ways. There are, however, distinct tradeoffs and benefits to understand.
Relational Databases have:
- Foreign keys and constraints
- Sophisticated query planners
- Declarative query language (SQL)
- A Key/Value model where the value is any unstructured data
- More data redundancy that provides better availability
- Eventual consistency
- Simplified query capabilities
- Riak Search
What you will gain:
- More flexible, fluid designs
- More natural data representations
- Scaling without pain
- Reduced operational complexity
For more information on Data Modeling, or to chat with a member of the Basho team on the topic, please request a Tech Talk.
January 6, 2015
If you have read about Riak, or seen a member of the Basho team present, you have probably heard the phrase “Your data is opaque to Riak.” While this is not, strictly, true with the inclusion of distributed Data Types in Riak 2.0, it was a phrase that hinted at the core structure of Riak itself.
Riak is a Key Value data store.
In a relational database, data is organized by tables that are separate and unique structures. Within these tables exist rows of data organized into columns. As such, interaction with the database is by retrieving or updating entire tables, individual rows, or a group of columns within a set of rows.
In contrast, Riak has a much simpler data model. An Object is both the largest and smallest element of data. As such, interaction with the database is by retrieving or modifying the entire object. There is no partial fetch or update of the data.
Keys in Riak are simply a binary value (or a string) that are used to identify Objects. The Key/Value pair (or Object) is stored in a higher level namespace called a Bucket. And, with Riak 2.0, there is an extra layer of abstraction known as Bucket Types.
This Key/Value/Bucket model enables broad flexibility in modeling the applications data domain with Riak as the data store for persistence.
Another NoSQL model that many are familiar with is the document store. Unlike the Key/Value model the data store is aware of the structure of the objects stored. These objects, or documents, are grouped into “collections” — which is analogous to a relational “table” — and the datastore provides a query mechanism to search collections for objects with particular attributes. When the data that is being persisted is easily rendered as a JSON document, a document store can seem a natural fit. Some common use cases include product catalog data and content management systems.
The Basho Docs have a lengthy tutorial entitled Using Riak as a Document Store that walks you through the process of leveraging Riak as a document store for a CMS. There are many approaches to modeling, but the tutorial demonstrates the power of Riak 2.0 features by combining the maps data type and indexing that data with Riak Search.
When the data you are persisting can be represented as JSON, and you require the ability to query the data, Riak 2.0 is an excellent solution for persisting and modeling document data. The flexibility of the Key/Value model, combined with the power of Riak Search and Riak Data Types, provide you with a highly scalable, highly available document store with rich, full-text query capabilities. In addition, the inclusion of the maps data type means that you don’t have to write complex client side resolution logic when faced with network partitions. Riak Data Types handle that conflict resolution automatically.
A scalable, available document store that is operationally simple may seem compelling enough to use Riak. But when you combine the characteristics of Riak with the multi-datacenter replication capabilities of Riak Enterprise, now you have a solution that enables you to bring your data operations closer to the end user.
Scalable, available, operationally simple, and replicated. That’s the power of using Riak as a document store.
December 8, 2010
Thank you to all who attended the webinar yesterday. The turnout was great, and the questions at the end were also very thoughtful. Since I didn’t get to answer very many, I’ve reviewed all of the questions below, in no particular order.
Q: Can you touch on upcoming filtering of keys prior to map reduce? Will it essentially replace the need for one to explicitly name the bucket/key in a M/R job? Does it require a bucket list-keys operation?
Key filters, in the upcoming 0.14 release, will allow you to logically select a population of keys from a bucket before running them through MapReduce. This will be faster than a full-bucket map since it only loads the objects you’re really interested in (the ones that pass the filter). It’s a great way to make use of meaningful keys that have structure to them. So yes, it does require an list-keys operation, but doesn’t replace the need to be explicit about which keys to select; there are still many useful queries that can be done when the keys are known ahead of time.
For more information on key-filters, see Kevin’s presentation on the upcoming MapReduce enhancements.
Q: How can you validate that you’ve reached a good/valid KV model when migrating a relational model?
The best way is to try out some models. The thing about schema design for Riak that turns your process on its head is that you design for optimizing queries, not for optimizing the data model. If your queries are efficient (single-key lookup as much as possible), you’ve probably reached a good model, but also weigh things like payload size, cost of updating, and difficulty manipulating the data in your application. If your design makes it substantially harder to build your application than a relational design, Riak may not be the right fit.
Q: Are there any “gotchas” when thinking of a bucket as we are used to thinking of a table?
Like tables, buckets can be used to group similar data together. However, buckets don’t automatically enforce data structure (columns with specified types, referential integrity) like relational tables do; that part is still up to your application. You can, however, add precommit hooks to buckets to perform any data validation that your application shouldn’t handle.
Q: How would you create a ‘manual index’ in Riak? Doesn’t that need to always find unique keys?
One basic way to structure a manually-created index in Riak is to have a bucket specifically for the index. Keys in this bucket correspond to the exact value you are indexing (for fuzzy or incomplete values,
use Riak Search). The objects stored at those keys have links or lists of keys that refer to the original object(s). Then you can find the original simply by following the link or using MapReduce to extract and find the related keys.
The example I gave in the webinar Q&A was indexing users by email. To create the index, I would use a bucket named
users_by_email. If I wanted to lookup my own user object by email, I’d try to fetch the object
email@example.com, then follow the link in it (something like
riaktag="indexed") to find the actual data.
Whether those index values need to be unique is up to your application to design and enforce. For example, the index could be storing links to blog posts that have specific tags, in which case the index need not be unique.
To create the index, you’ll either have to perform multiple writes from your application (one for the data, one for the index), or add a commit hook to create and modify it for you.
Q: Can you compare/contrast buckets w/ Cassandra column families?
Cassandra has a very different data model from Riak, and you’ll want to consult with their experts to get a second opinion, but here’s what I know. Column families are a way to group related columns together that you will always want to retrieve together, and is something that you design up-front (it requires restarting the cluster for changes to take effect). It’s the closest thing to a relational table that Cassandra has.
Although you do use buckets to group similar data items, in contrast, Riak’s buckets:
- Don’t understand or enforce any internal structure of the values,
- Don’t need to be created or designed ahead of time, but pop into existence when you first use them, and
- Don’t require a restart to be used.
Q: How would part sharing be achieved? (this is a reference to the example given in the webinar, Radiant CMS)
Radiant shares content parts only when specified by the template language, and always by inheritance from ancestor pages. So if the layout contained
<r:content part="sidebar" inherit="true"
/>, then if the currently rendering page doesn’t have that content part, it will look up the hierarchy until it finds it. This is one example of why it’s so important to have an efficient way to traverse the site hierarchy, and why I presented so many options.
Q: What is the max number of links an object can have for Link Walking?
There’s no cut-and-dry answer for this. Theoretically, you are limited only by storage space (disk and RAM) and the ability to retrieve the object from the desired interface. In a practical sense this means that the default HTTP interface limits you to around 100,000 links on a single object (based on previous discussions of the limits of HTTP packets and header lengths). Still, this is not going to be reasonable to deal with in your application. In some applications we’ve seen links on the order of hundreds per object negatively impact link-walking performance. If you need to have that many, you’ll be better off exploring other designs.
Again, thanks for attending! Look for our next webinar coming in about month.
— Sean, Developer Advocate
December 1, 2010
Moving applications to Riak involves a number of changes from the status quo of RDBMS systems, one of which is taking greater control over your schema design. You’ll have questions like: How do you structure data when you don’t have tables and foreign keys? When should you denormalize, add links, or create MapReduce queries? Where will Riak be a natural fit and where will it be challenging?
We invite you to join us for a free webinar on Tuesday, December 7 at 2:00PM Eastern Time to talk about Schema Design for Riak. We’ll discuss:
- Freeing yourself of the architectural constraints of the “relational” mindset
- Gaining a fuller understanding of your existing schema and its queries
- Strategies and patterns for structuring your data in Riak
- Tradeoffs of various solutions
We’ll address the above topics and more as we design a new Riak-powered schema for a web application currently powered by MySQL. The presentation will last 30 to 45 minutes, with time for questions at the end.
If you missed the previous version of this webinar in July, here’s your chance to see it! We’ll also use a different example this time, so even if you attended last time, you’ll probably learn something new.
Fill in the form below if you want to get started building applications on top of Riak!
Sorry, registration is closed! Video of the presentation will be posted on Vimeo after the webinar has ended.