November 14, 2013
This series of blog posts will discuss how Riak differs from traditional relational databases. For more information about any of the points discussed, download our technical overview, “From Relational to Riak.” The previous post in the series was Relational to Riak – High Availability.
Riak is designed for scalability, which truly separates it from relational systems. As described in the previous post, relational databases run best on a single server. If the dataset grows beyond the capacity of this single machine, it can become prohibitively expensive (or even impossible) to simply upgrade to a bigger machine. In such a scenario, the only option may be to add more machines and divide the dataset across them using a technique called sharding.
Sharding divides data into logical parts (such as alphabetical, by customer, or by geographic region) that can be distributed across multiple machines – often manually. If data continues to grow, this process may need to be repeated at great expense.
Sharding is not only difficult, it also will typically lead to hot spots – meaning certain machines are responsible for storing and serving a disproportionately high amount of both data and requests. Hot spots can cause unpredictable latency and degraded performance.
(And remember all the ways in which availability is a challenge? Combine sharding with a master/slave architecture for maximal expense and general unpleasantness.)
Instead of sharding, Riak evenly distributes data across a cluster using consistent hashing. In a Riak cluster, the data space is divided into partitions which are claimed by the servers. When new data is written to the database, these objects are evenly placed around the ring and replicated 3 times (by default). This ensures that your data will always be available, even when nodes fail.
When nodes are added or removed, data is rebalanced automatically. New machines assume ownership of some of the partitions and existing machines hand off relevant partitions and associated data until data ownership is equal amongst nodes.
By eliminating the manual requirements of sharding and making hot spots highly unlikely, Riak makes it significantly easier for companies to scale, whether it’s just for a few months to handle peak loads or to support long-term growth strategies.
November 13, 2013
This series of blog posts will discuss how Riak differs from traditional relational databases. For more information about any of the points discussed, download our technical overview, “From Relational to Riak.”
One of the biggest differences between Riak and relational systems is our focus on availability. Riak is designed to be deployed to, and runs best on, multiple servers. It can continue to function normally in the presence of hardware and network failures. Relational databases, conversely, are simplest to set up on a single server.
Most relational databases offer a master/slave architecture for availability, in which only the master server is available for data updates. If the master fails, the slave is (hopefully) able to step in and take over.
However, even with this simple model, coping with failure (or even properly defining it) is non-trivial. What happens if the master and slave server cannot talk to each other? How do you recover from a split brain scenario, where both servers think they’re the master and accept updates? What happens if the slave is slow to respond to updates sent from the master database? Can clients read from a slave? If so, does the master need to verify that the slave has received all updates before it commits them locally and responds to the client that requested the updates?
Conversely, Riak is explicitly designed to expect server and network failure. Riak is a masterless system, meaning any server can respond to read or write requests. If one fails, others will continue to service client requests. Once this server becomes available again, the cluster will feed it any updates that it missed through a process we call hinted handoff.
Because Riak’s system allows for reads and writes when multiple servers are offline or otherwise unreachable, data may not always be consistent across the environment (usually only for a few milliseconds). However, through self-healing mechanisms like read repair and Active Anti-Entropy, all updates will propagate to all servers making data eventually consistent.
For many use cases, high availability is more important than strict consistency. Data unavailability can negatively impact revenue, damage user trust, lead to poor user experience, and cause lost critical data. Industries like gaming, mobile, retail, and advertising require always-on availability. Visit our Users Page to see how companies in various industries use Riak.
August 20, 2013
NoSQL is a misleading name. SQL was never the problem. However, this poorly named industry term does represent a response to changing business priorities and new challenges that require different kinds of database architectures.
Traditional database architectures were first developed in the late 60s and early 70s. They were the default option for many pre-Internet use cases and remain useful today for certain use cases requiring a relational data model. However, their limits are painfully apparent to many companies. Despite what traditional database vendors might have us believe, very little data generated today actually requires a SQL architecture. Businesses face many new challenges today that traditional databases simply are not designed to handle reliably or efficiently. These include:
- Global Users. It is no longer enough to provide a fast experience in one country. Users from all over the globe expect a low-latency experience, making geo-data locality more important than ever.
- Zero Downtime. Planned and unplanned. Both are bad for business. There is now an expectation for always-on availability. Operations teams emphasize must resiliency over recovery.
- Scale Matters. Businesses need to scale up quickly to meet peak loads during the holidays or product launches, and then they need to scale back down. They need an architecture that makes scaling the least of their worries.
- Flexible Data. From user generated data to machine-to-machine (M2M) activity, unstructured data is now commonplace. Businesses need flexibility to handle all the data generated and flowing.
- Omnichannel. Whether users are on a tablet, laptop, or smartphone, they require a device agnostic experience and low-latency.
- Amazon Economics. Every business wants Amazon Economics. With the nature of data growth today, businesses can’t afford expensive machines at every juncture. They need commodity machines to scale horizontally, not vertically.
Attempts to address these challenges with traditional databases result in an inflexible architecture with super high costs. “NoSQL” databases represent a fresh approach towards building flexible, resilient architectures. “NoSQL” goes where no database has ever gone before — into the wild space of the Internet and the massive scale requirements it represents.
Which brings us to NoSQL Now! Basho is sponsoring because the movement is more important than any single industry term. Andy Gross will also be on-hand to further discuss the larger trend of distributed systems:
Dealing with Systems in a New Distributed World
Chief Architect and Co-creator of Riak
Thursday, August 22, 2013
Please join us in San Jose for a look at the future of database technology.