Timing is Everything – A NoSQL Solution for Time Series Data

Hi, I’m Susan. I’ve been working with NoSQL since 2009 and have seen the landscape evolve as people have realized the possibilities of using NoSQL to find value in their unstructured data. I’ve recently joined Riak as the technical marketing director, and am excited to be here. And it’s not just an exciting time for me joining Riak, it’s an exciting time for NoSQL. Pun intended!

We’re seeing NoSQL evolve from a class of databases designed to solve problems of unstructured data to the emergence of the Time Series Database (TSDB) designed to handle high-speed capture of time-based data, so we can analyze patterns and trends over time.

The variety of ways that time series information can be useful is wide-ranging and growing fast, as new technologies and device proliferation are producing more data. Database technologies are evolving to allow us to make use of large volumes of time-based data.

Below are some of the ways Time Series Databases are being used:

Predictive analysis of resource usage for capacity planning
Tracking the exact time and sequence of financial transactions made by banks and stock exchanges
Personalization and target marketing based on previous buying patterns of your customer base
Log and metric information analysis to allow you to be proactive
GPS tracking of shipment of goods
Analysis of weather patterns to better predict emergency conditions, allowing for more time to evacuate an area
Network traffic analysis for detecting anomalous behavior or identify malicious activity

A good time series database is purposely built to collect, store, manage and analyze data at scale, allowing you to focus on getting the most value from your data to improve the way your organization does business. Part of that design allows for storing blocks of data together so analysis will be easier and faster.

The location where time series data is stored in your database should facilitate fast retrieval for queries and analysis. Over time, older detailed data becomes less relevant and it is commonly rolled up into aggregate information and the records expired or archived.

An enterprise-grade time series database must be able to gather and store millions of data points per second, scale to accommodate vast amounts of data, and be able to efficiently query all this data to answer questions about your business. You can analyze smart meter data to help you answer questions like the energy usage in a specific location, or understand usage patterns to develop better pricing models. This data can also help you predict future capacity requirements and anticipate infrastructure changes.

One of the big ways time series is being used is to analyze weather. But who knew that weather forecasts can affect what we buy for dinner?

By analyzing weather data, Walmart knows that when the wind is low and the temperature is below 80 degrees you are more likely to buy berries. When it’s warm, but not too hot, and a little more windy, without rain, you’re more likely to buy steak. This insight comes from analyzing time series data that allowed Walmart to triple their berry sales.

Analyzing (Weather Data + Purchase History) = Big Money.

Most of you are familiar with the popularity of the Internet of Things (IoT). Devices that make our lives simpler are being connected to the internet so we can do things like control the environment in our homes and use smart meters to send information directly to the power company. GPS devices can track trucks with shipments for warehouses and stores to keep our supply chain optimized.

Industries like Oil and Gas use IoT devices to monitor their trucks, oil rigs, and key parts of their refineries to alert them to potential trouble before they become real problems, saving time, money and the environment.

On the consumer market, device makers can not only make our lives easier, they can gather data to analyze how we are using their devices and services. This allows them to make smarter, targeted decisions for future offerings by knowing how we actually interact with the devices and services, rather than on the assumptions they might have initially made.

Riak introduced Riak TS, a time series database last October. It is purposely built to handle large volumes of data at high velocity. It easily scales allowing you to query and analyze your data quickly. Riak TS is built on the same foundation as Riak KV, already known in the industry for its resilience, fault tolerance, horizontal scalability using commodity hardware, and simplicity of operational management.

Some of our Riak KV customers are The Weather Company, bet365, Comcast, NHS, Best Buy, and Yammer, to name a few. These companies need a proven flexible, scalable, highly available, resilient and dependable database to run their enterprise class businesses.

bet365, one of the world’s leading online gambling companies continued to scale up to a 160 core x64 platform and were looking for a way to continue to scale their operations. They chose Riak over nine other competitors for its stability, near-linear scalability, and reliability.

As an enterprise time series database, Riak TS provides data co-location and time quantization allowing similar data to be located together, reducing the amount of time and effort it takes to access the data, resulting in faster queries. Riak TS supports familiar SQL commands so your business analysts can analyze your data. Additionally, Riak TS provides excellent integration with Apache Spark, including support for Spark Streaming, DataFrames and Spark SQL.

Some key Riak TS features:

Resiliency ensures that your time series data is always available even in the event of hardware or network failures.
Scalability allows you to easily add and remove capacity seamlessly using commodity hardware.
Operational Simplicity needs almost no human interaction to manage your clusters.
Data Co-Location stores your data closer together to make your queries faster and easier.
SQL Commands lets you work with TS in a familiar manner.
SQL Range Queries helps you get answers from your data.
Aggregation enables you to roll up detailed data points to answer business questions.
Apache Spark Connector makes it easy to use Spark to analyze your data.
Robust APIs and Client Libraries to write your application quickly and easily.

Our product manager, Seema Jethani, has written a blog that provides more details on the Riak TS 1.1 release.

Stay tuned for upcoming technical blogs on Riak TS from our Solution Architects!

Have a GREAT day!

Susan Lee

Technical Marketing Director

@susanleesv

Riak Blog

Timing is Everything – A NoSQL Solution for Time Series Data

Recent Articles

Riak vs. Cassandra – A Brief Comparison

Riak Academy: New Year, New Content

Roundtable With Riak, IBM, Raytheon Discusses Distributed Security for IoT – Part 3 of 3

IoT Roundtable: Riak, IBM, Raytheon Discuss Enabling Decisions Based on Factual Data – Part 2 of 3