Big Data Databases Explained
WHAT IS A BIG DATA DATABASE?
As organizations increasingly look to big data to deliver valuable business insights, it has become clear that the traditional relational database management systems (RDBMS) that have been the standard for the past 30 years are not up to the task of handling these new data requirements. As a result, a variety of big data database options have emerged. While the technologies differ, they all designed to overcome the limitations of RDBMS to enable organizations to extract value from their data.
BIG DATA DATABASE REQUIREMENTS
To appreciate why there is a need for new database options to handle big data, it is important to understand the impact of the three main characteristics that distinguish big data: volume, variety, and velocity.
- Volume: True to its name, big data is commonly measured in petabytes, exabytes, and even zettabytes. Traditional RDBMS scale out by increasing the server and storage capacity. Because these systems are not designed to run on commodity hardware and they require highly complex sharding techniques to distribute data across several database servers, scaling can be extremely expensive and disruptive. For example, an Oracle RAC system can cost millions to store just 20 terabytes of data—an amount that could account for just one day’s worth of data ingestion for a sizeable organization today. In contrast, big data databases minimize the cost and burden of scaling with scale-out approaches that make it easy to quickly add or reduce capacity using inexpensive, commodity hardware with little to no manual intervention.
- Variety: In the past, most data was structured to fit the rigid data model of RDBMS. With the rise of big data, unstructured data—including everything from social media posts, images, and video to time-series IoT data—is growing far more rapidly than structured data. The only way RDBMS can handle heterogeneous data that does not fit predefined schema is through cumbersome and complex workarounds. Big data databases do not have this problem. They use flexible data storage models that are built to ensure that all types of data can be easily stored and queried using a variety of methods.
- Velocity: Speed is critical in the big data era. Massive volumes of heterogeneous data are being created in real time, and the expectation is that they can be ingested, stored, and processed in near-real time. This is particularly important with information such as time-series IoT data. Without the ability to handle the volume and variety of big data, RDBMS performance can suffer and even cause downtime. Big data databases are designed to keep up with the relentless demands of capturing vast troves of all types of data without losing performance or availability.
BENEFITS OF A BIG DATA DATABASE
Systems that are designed with big data in mind are often called NoSQL databases due to the fact that they do not necessarily rely on the SQL query language used by RDBMS. There are many flavors and brands of NoSQL databases that are designed for different use cases. The major categories of NoSQL databases are; document, key/value, graph, big table, and time series among others. Each technology has its own set of benefits, but they generally benefit big data use cases in the following ways:
- Scalability: NoSQL databases eliminate the prohibitive complexity, disruption, and cost associated with scaling traditional RDBMS. Because capacity can be quickly and efficiently added or reduced at any time, NoSQL enables organizations to easily scale out to embrace big data initiatives.
- Cost-efficiency: Because NoSQL uses inexpensive commodity hardware, cost savings versus RDBMS become more dramatic over time as greater capacity is needed to accommodate petabytes and exabytes of big data. Also, organizations only need to deploy the amount of hardware that is required to meet current capacity requirements rather than making large purchases ahead of need.
- Flexibility: Whether an organization is developing web, mobile, or IoT applications, the fixed data models of RDBMS prevent or dramatically slow down an organization’s ability to adapt to evolving big data application requirements. NoSQL enables developers to use the data types and query options that best fit the specific application use case, enabling faster and more agile development.
- Performance: As mentioned, with RDBMS, increasing performance incurs tremendous expense and the overhead of manual sharding. On the other hand, when compute resources are added to a NoSQL database, performance increases in a proportional manner so that organizations can continue to deliver a reliably fast user experience.
- High Availability: Typical RDBMS systems rely on primary/secondary architectures that are complex and can create single points of failure. By using a masterless architecture that automatically distributes data among multiple resources, some “distributed” NoSQL systems ensure that the database stays available and is able to keep pace with the massive read and write demands of big data applications.
RIAK KV SUPPORTS THE BIG DEMANDS OF BIG DATA.
For organizations looking to meet the requirements of big data applications, Riak KV delivers massive scalability, fast performance, high availability, and powerful data models for storing unstructured data.
Read about Riak KV
RIAK TS FOR TIME SERIES AND IOT DATA.
Optimized for fast reads and writes of time series data, Riak TS offers resiliency, massive scalability, and operational simplicity for organizations that need to store, query, and analyze IoT, device, and sensor data.
Read about Riak TS