March 30, 2015
This is the first post in a series of blog posts, entitled Riak Customer Stories, where we will look at common use cases for Riak and their applicability in specific verticals. Our first customer stories will focus on how Riak is helping Gaming companies achieve massive scalability.
Online gaming continues to grow in popularity, whether for huge gaming communities like Riot Games’ League of Legends or gaming sites like bet365, one of the world’s leading online gambling groups. This growth is forcing changes to existing infrastructure in order to keep up with demand and innovation. Traditional relational databases can’t meet the requirements for massive scalability, speed, and fault tolerance
Innovation is critical to retain long-term customer loyalty and is changing the way gamers play online. These changes include the move away from single bets on an event to in-game betting on an ever-increasing range of metrics. The advent of regional gaming competitions, like the League of Legends World Championship with an annual grand prize of $1 million, show just how far gaming has come.
Gaming on Riak
Companies who build games or betting sites use Riak in three key ways:
- Player Data – Riak provides low-latency, highly available data storage for key player data, including user and profile information, game performance, statistics and rankings, and more. Riak also provides many different tools for querying and indexing this data, such as a full-text search engine and secondary indexing.
- Session Storage – Riak is used to store and serve session data with predictable low-latency, which is necessary for game play. Riak imposes no restrictions on the type of content stored (since all objects are stored on disk as binaries), so session data can be encoded in many ways and can evolve without administrative changes to schemas.
- Global Data Locality – While gaming, players require a low-latency experience, regardless of their physical location. Interrupted or slow game play can lead to poor user experience and player abandonment. Riak Enterprise’s multi-datacenter capabilities allow game data to be physically close to players and for fast response times regardless of player location.
- Social Information – Riak is built for very fast data storage. Due to its inherent design and Riak’s simple key/value data model, Riak is ideal for storing and serving social content such as social graph information, player profiles, player relationships, social authentication accounts, and other types of social gaming data.
By using Riak, companies have achieved global availability, massive scalability, while still maintaining operational simplicity These benefits are derived from the core architectural decisions made in the design of Riak.
By design Riak is masterless. Each node in a Riak cluster is the same, containing a complete and independent copy of the Riak package. There is no “master” or coordinating node. This uniformity provides the basis for Riak’s fault-tolerance and scalability. When this is coupled with an even distribution of data around the cluster via consistent hashing, there is a significant decrease in risky “hot spots” in the database while lowering the operational burden associated with manually sharding data. In addition, new nodes can easily be added with automatic, minimal redistribution of data.
This distribution of data in a masterless system is supplemented with a process of “hinted handoff”. Hinted handoff lets Riak cleanly handle node failure. If a node fails, a neighboring node takes over its storage operations. When the failed node returns, any updates received by the neighboring node are handed back to it. This ensures availability for writes and updates and happens automatically.These are discussed in greater detail in a blog post entitled Why Riak Just Works.
Modeling Gaming Applications in Riak
The table below illustrates key/value mappings for common application types. Remember that values in Riak are opaque and stored on disk as binaries – JSON or XML documents, images, text, etc. Riak has a “schemaless” design. Objects are comprised of key/value pairs, which are stored in flat namespaces called “buckets.” The way data is organized in Riak should take into account the unique needs of the application, including access patterns such as read/write distribution, latency differences between various operations, use of Riak features (including MapReduce, Search, Secondary Indexes), and more.
Here are some common approaches to structuring gaming data with Riak’s key/value design:
|Player Data||Login, email, UUID||Player Attributes (often stored as a JSON document); Player Rewards and Stats|
|Social Data||Login, email, UUID||Player Profiles, Social Graph Information, Facebook/Twitter Tokens|
|Session Information||User/Session ID||Session Data|
|Image or Video Content||Content Name, ID, or Integer||.JPG, .PNG, .GIF or other image format; .MOV, .MPG, .MP4 or other video file format|
Gaming Customer Stories
In a recent webinar, Dan Macklin, Head of Research and Development at bet365, provided an overview of their decision making process in choosing Riak. As one of the world’s leading online gambling groups, with over 18 million customers in two hundred countries, bet365 has a unique perspective on making an informed, strategic decision when designing an always available application architecture.
In this webinar, Dan discussed:
- bet365’s journey to Riak
- The evaluation and technical challenges being addressed
- The triumphs of migrating to Riak
- Advice for anyone evaluating their database requirements
bet365 was faced with a massive scale issue. Their existing SQL, relational database solution was simply unable to keep up with the demand placed on it by their infrastructure without needed to incur the complexity and cost of sharding. The lack of scalability was causing undue stress on their infrastructured leading to a loss of availability. Of particular interest, for those sharing a similar decision making process, is that Dan discusses not only their search for a solution but their decision making process that ultimately identified Riak
The session is available for replay here.
At RICON 2014, Basho’s distributed systems conference for developers, Michal Ptaszek gave a session entitled Let’s Chat About Chat. This session provided detailed insight into how Riot Games built their League of Legends chat system with Riak to handle 70 million players.
In League of Legends, just as in any competitive team game, communication is essential to success. Therefore, when building Chat for the game we had to make sure that the new service would be absolutely rock solid in every respect. This includes not only guaranteed message delivery and consistent presence propagation across the system, but also maintenance of the created social network graph.
In this talk I would like to present how we achieved linear scalability for Chat, improved its overall fault tolerance, and got ready for the new features we wanted to ship. I will also discuss in detail why we migrated our data from MySQL to Riak and how we used CRDTs to deal with conflicting object updates.
As is thematic in gaming use cases, database scalability was a primary consideration and was an architectural consideration from the start. Riot Games started their application modeling with MySQL –a relational database– but hit multiple performance, reliability, and scalability issues. As an example, it simply was not possible to update the database schema fast enough to track changes made in code.
In addition, Riot Games leverages the multi-datacenter capabilities off Riak Enterprise to export persistent data to a secondary Riak cluster. Costly ETL queries, like social graph queries, are run on the secondary cluster without interrupting the primary cluster. This design pattern is often referred to as a “Secondary Analytics Cluster”.
Some statistics that highlight the immense scale that Riot deals with:
- 67 million unique players every month (not counting other services using chat)
- 27 million daily players
- 7.5 million concurrent players
- 1 billion events routed per server, per day, only using 20-30 percent of available CPU and RAM
- 11K messages per second
- A few hundred chat servers are deployed around the world. Managed by 3 people
- 99% uptime
To learn more about Riak in the Gaming and Gambling industry, there are several useful resources to begin your research and design your deployment.
- Riak Solution for Gaming – This Solution Brief discusses using Riak for a variety of gaming and gambling use cases.
- Riak Tech Talk – Our experienced team can help develop your use case, answer questions, and make sure you are successful at every step from development to production. We can arrange either in-person or virtual meetings, depending on availability and location.
- Why bet365 chose Riak – Get a better understanding of how to make informed strategic decisions directly from someone who has taken the journey. Dan Macklin, Head of Research and Development at bet365 will show you how. His story about choosing Riak will captivate anyone that needs to ensure their data is always available.
Yesterday it was announced that Apple has acquired FoundationDB. As you may imagine, I have been asked to comment on what this means for the NoSQL database industry and for those who are investing heavily in retooling their traditional database infrastructures with new technologies to meet the availability, scalability, and fault-tolerance characteristics required by the massive influx of data.
NoSQL databases are an increasingly critical part of enterprises’ ability to derive real business value from the massive amounts of data that users, devices and online systems generate. They are also an important part of the developers’ toolkits when building applications for the Internet of Things, a major contributor to this ever-growing body of data. Apple is acutely aware of the importance of being able to reliably scale to meet the real-time data needs of today’s global applications. The news of Apple’s intent to acquire FoundationDB greatly amplifies these points to a growing number of IT and engineering leaders.
Part of the comments around the announcement are the discussion of Open Source software both as an underpinning for enterprise infrastructure and as a viable business model. I contributed to a detailed discussion about the latter in a recent article on Silicon Angle entitled NoSQL market frames larger debate: Can open source be profitable?, noting that there is enormous opportunity for Open Source NoSQL companies if they can serve the specific needs of enterprise customers. We feel that we are doing so, and that our approximately 1:10 ratio of paying customers to Open Source users is an indicator of our solution’s value and the strength of our business. Our clear path to being cash-flow-positive includes a measured, strategic investment in R&D which is essential to ensuring Basho’s corporate viability for all customers who have, already, made multi-million dollar investments in their business critical workloads.
Unlike others, the core underpinnings of Riak as a distributed, multi-model data persistence platform are, and will remain, Open Source. Basho builds premium, enterprise-grade features atop this distributed infrastructure, and these features help us attract a higher percentage of paying customers than others in the industry.
Acquisition and consolidation — whether done to enhance technical capabilities, secure talent, or expand a company’s customer base — are essential to the high technology arena. The NoSQL space will be the focus of more of this activity than most in the coming year, given the amount of attention it has already received, with PwC naming NoSQL as one of the “surprising digital bets for 2015” and given the success of the HortonWorks IPO. Combine that buzz with the fact that a prominent database ranking tool lists more than 200 different database management systems, and we are certain to see more industry consolidation.
The decision to re-architect an existing enterprise data workload infrastructure is not one to be taken lightly. Basho’s commitment to Open Source, our commitment to long-term business viability, and our impressive list of customers making substantial investments, point to a bright future not only for our company but for those who choose Riak as a core underpinning of their persistence infrastructure. Apple’s acquisition of FoundationDB strongly validates the value of the solutions we offer and underscores the criticality of these technologies to companies that need to scale business-critical applications.
February 23, 2015
Over the last week, for a variety of reasons, the topic of security in the NoSQL space has become a prominent news item. Chief among these reasons was the announcement of a popular NoSQL database having multiple instances exposed to the public internet. From the headlines you might think that NoSQL solutions have inherent security problems. In fact, in some cases, the discussion is positioned intentionally as a relational vs. NoSQL issue. The reality is that NoSQL is not more or less secure than a traditional RDBMS.
The Security of any component of the technology stack is both the responsibility of the vendor providing the technology and those that are deploying it. How many routers are running with the default administrative password still set? Similarly, exposing any database, regardless of type, to the public internet without taking appropriate security precautions, including user authentication and authorization, is a “bad idea.” A base level of network security is an absolute requirement when deploying any data persistence utility. For Riak this can include:
- Appropriate physical security (including policies about root access)
- Securing the epmd listener port, handoff_port listener port, and the range ports specified in the riak.conf
- Defining users and optionally, groups (using Riak Security in Riak 2.0)
- Defining an authentication source for each user
- Granting necessary permissions to each user (and/or group)
- Checking Erlang MapReduce code for invocations of Riak modules other than riak_kv_mapreduce
- Ensuring your client software passes authentication information with each request, supports HTTPS or encrypted Protocol Buffers traffic
If you enable Riak security without having an established functioning SSL connection, all request to Riak will fail because Riak security (when enabled) requires a secure SSL connection. You will need to generate SSL certificates, enable SSL, and establish a certification configuration on each node.
The security discussion does not, however, end at the network. In fact, for those who are familiar with the Open Systems Interconnection model (OSI), a 7 layer conceptual model that characterizes and standardizes the internal functions of a communication system by partitioning it into abstraction layers, (ISO 7498-1) there is a corresponding security architecture reference (ISO 7498-2)…and that is just for the network. It is necessary to take adopt a comprehensive approach to security at every layer of the application stack…including the database.
The process of securing a database, which is only a component of the application stack, requires striking a fine balance. Basho has worked with large enterprise customers to ensure that Riak’s security architecture meets the needs of their application deployments and balances the effort required with the security, or compliance, requirements demanded by some of the worlds largest deployments.
NoSQL vs. Relational Security
As enterprises continue to adopt NoSQL more broadly, the question of security will continue to be raised. The reality is simple, it is necessary to evaluate the security of the database you are exploring in the same way that you would evaluate its scalability or availability characteristics. There is nothing inherent to the NoSQL market that makes it less, or more, secure that relational databases. It is true that some relational database, by aegis of their age and maturation, have more expansive security tooling available. However, when adopting a holistic, risk-based approach to security NoSQL solutions — like Riak — are as secure as required.
Security and Compliance
A compliance checklist (be it HIPAA or PCI) details, in varying specificity, the security requirements to achieve compliance. This checklist is subsequently verified through an audit by an independent entity…as well as ongoing internal audits.
So can I use NoSQL in compliant environments?
Without question, Yes. The difficulty of achieving compliance will depend on how the database is configured, what controls it provides for authentication and authorization, and many other elements of your application stack (including physical security of the datacenter, etc). Basho customers have deployed Riak in highly regulated environments and achieved their compliance requirements.
I would encourage you, however, to realize that compliance is an event. The process of securing your application, database, datacenter, etc. is an ongoing exercise. Many, particularly those in the payments industry, refer to this as a “risk-based” approach to security vs. a “compliance-based” approach.
Security and Riak
In nearly all commercial deployments of Riak, Riak is deployed on a trusted network and unauthorized access is restricted by firewall routing rules. This is expected, this is necessary and is sufficient for many use cases (when included as part of a holistic security posture including locking down ports, reasonable policies regarding root access, etc.). Some applications need an additional layer of security to meet business or regulatory compliance requirements.
To that end, in Riak 2.0, the security store changed substantially. While you should — without question — apply network layer security on top of Riak and the systems that Riak runs upon, there are now security features built into Riak that protect Riak itself, not just its network. This includes authentication (the process of identifying a user) and authorization (verifying whether the authenticated user has access to perform the requested operation). Riak’s new security features were explicitly modeled after user- and role-based systems like PostgreSQL. This means that the basic architecture of Riak Security should be familiar to most.
In Riak, administrators can selectively control access to a wide variety of Riak functionality. Riak Security allows you to both authorize users to perform specific tasks (from standard read/write/delete operations to search queries to managing bucket types and more) and to authenticate users and clients using a variety of security mechanisms. In other words, Riak operators can now verify who a connecting client is and determine what that client is allowed to do (if anything). In addition, Riak Security in 2.0 provides four options for security sources:
- trust — Any user accessing Riak from a specified IP may perform the permitted operations
- password — Authenticate with username and password (works essentially like basic auth)
- pam — Authenticate using a pluggable authentication module (PAM)
- certificate – Authenticate using client-side certificates
More detail on the Riak 2.0 Security capabilities are presented in the Security section of the documentation, in particular the section entitled Authentication and Authorization.
With a NoSQL system that provides authentication and authorization, and a properly secured network, you have progressed a long way in reducing the risk profile of your system. The application layer, of course, must still be considered.
Relational databases are still a part of the technology stack for many companies; others are innovating and incorporating NoSQL solutions either as a replacement for or alongside existing relational databases. As a result they have simplified their deployments, enhanced their availability, and reduced their costs.
Join us for this webinar where we will look at the differences between relational databases and NoSQL databases like Riak. We will look at why companies choose Riak over a relational database. We will analyze the decision points you should consider when choosing between relational and NoSQL databases and we will look at specific use cases, review data modeling and query options.
This Webinar is being held in two time slots:
- Wednesday, March 4, 2015 8:00-9:00 AM PST (4:00-5:00 PM GMT)
- Wednesday, March 4, 2015 12:00-1:00 PM PST (3:00-4:00 PM EST)
December 18, 2014
One of the interesting things about attending industry events, like AWS re:Invent, is identifying common trends that arise in conversations. Recent conversations point to a renewed interest in “enterprise ready replication” for NoSQL databases.
As business data continues to grow, there is an entirely new set of challenges that are presented related to availability, scalability, and fault-tolerance. While most NoSQL databases work at small scale, availability is often compromised as applications reach full production or peak capacity. Having the right replication functionality is key to ensuring that availability requirements are not compromised as your system grows.
“Replication” may mean different things based on context. In this case, we are referring to the movement of data in a database cluster — or across datacenters — for the purpose of redundancy or data locality. If your database experience began in an RDBMS context, then replication implies a specific contextual understanding of multi-master transactional deployment and, perhaps, shipping transaction logs between incremental backups in a hot/warm database configuration. In contrast, for those who began in the NoSQL era, the term may evoke images of replica-sets on a sharded infrastructure and the operational overhead associated therewith.
In a distributed NoSQL database, like Riak, the term replication is used to encompass two distinct concepts. First, intra-cluster replication for high availability and fault tolerance within the datacenter; and second, multi-datacenter replication for data locality and global availability. There is none of the complexity of log shipping or dealing with a sharded infrastructure.
Data replication is a core feature of Riak’s basic architecture. Riak was designed to operate as a clustered system containing multiple nodes (commodity servers or cloud instances). The replication implementation allows data to live on multiple machines at once, with a single write request, in case a node in the cluster goes down or is unavailable due to issues like network partitioning.
Intra-cluster replication is fundamental and automatic in Riak, so that your data is always available. All data stored in Riak is replicated to a number of nodes in the cluster according to a configurable parameter (
n_val) set in a buckets bucket type.
With the default
n_val setting of 3, there are always three copies of all data. These copies will be on three different partitions/vnodes. A detailed explanation and analysis of this replication capability is discussed in the Riak documentation – Understanding replication by example.
In the case of intra-cluster replication, or what we would refer to simply as “replication”, data distribution ensures redundant data such that high availability is maintained in a failure state.
In contrast to intra-cluster replication, multi-datacenter replication (a feature of Riak Enterprise) is a critical part of modern application infrastructures. Riak Enterprise offers multi-datacenter replication features so that data stored in Riak can be replicated to multiple sites (vs. multiple servers in the same site).
As we are all aware, understanding application latency (for an end user) begins with the realization data can’t travel faster than the speed of light. So, inherently, as source information moves further from it’s consumption latency is introduced. As such, there is a set amount of latency for a customer connecting to your application hosted in New York when they are accessing the application from San Francisco. This latency profile increases, and becomes more complex, as the geographic distribution of your customer base increases.
With multi-datacenter replication in Riak Enterprise, data can be replicated across locations and geographic areas providing for disaster recovery, data locality, compliance with regulatory requirements, the ability to “burst” peak loads into public cloud infrastructure, amongst others.
Riak’s multi-datacenter replication is masterless. One cluster acts as a primary, or source, cluster. The primary cluster handles replication requests from one or more secondary, or sink, clusters (generally located in datacenters in other regions or countries). If the datacenter with the primary cluster goes down, a secondary cluster can automatically take over as the primary cluster.
More architectural strategies for multi-datacenter implementations, are covered in the Basho whitepaper entitled Riak Enterprise: Multi-Datacenter Replication – A Technical Overview & Use Cases or in the Basho Documentation section Multi-Datacenter Replication: v3 Architecture.
Replication, inside a cluster, is a core design tenant of Riak. This is used to provide the availability and fault-tolerance characteristics — with a low operational overhead — that many unstructured data workloads demand.
Multi-datacenter replication, while related, is an entirely different approach and architecture to enable the geographic distribution of data to solve for redundancy, geo-data locality, etc.
Replication is an important scalability feature of any database deployment. Ensuring that your NoSQL database replicates data in a way that is scalable, operationally simple and achieves your business objectives is key to your success.