January 22, 2015
In speaking with Riak users, both open source and commercial, we are frequently told that Riak’s key/value model is more flexible and faster to develop against than a traditional relational database. Even though Riak is well suited for many applications, there are inevitable tradeoffs in terms of query options and data types that are available. With a key/value model, there is no concept of columns or rows, therefore Riak does not have join operations. Riak can be queried either directly via HTTP, the protocol buffers API and through various client libraries. However, there is no SQL or SQL-like language that is currently available.
Riak’s key/value data model does not preclude queryability. There are several powerful querying options including:
- Riak Search: Integration with Apache Solr provides full-text search and support for Solr’s client query APIs.
- Secondary Indexes: Secondary Indexes (2i) give developers the ability to tag an object stored in Riak with one or more query values. Indexes can be either integers or strings, and can be queried by either exact matches or ranges of values.
- MapReduce: Developers can leverage Riak MapReduce for tasks like filtering documents by tag, counting words in documents, and extracting links to related data.
For more information, check out the Riak documentation on Querying Data.
The table below illustrates key/value mappings for common application types. Remember that values in Riak are opaque and stored on disk as binaries – JSON or XML documents, images, text, etc. The way data is organized in Riak should take into account the unique needs of the application, including access patterns such as read/write distribution, latency differences between various operations, use of Riak features (including MapReduce, Search, Secondary Indexes), and more.
|Session||User/Session ID||Session Data|
|Advertising||Campaign ID||Ad Content|
|Sensor||Date, Date/Time||Sensor Updates|
|User Data||Login, eMail, UUID||User Attributes|
|Content||Title, Integer||Text, JSON/XML/HTML Document, Images, etc.|
Consider, for example, one of the canonical use cases for Riak…storing user and session data. In a relational database, the “users” table is well known and, basically, provides a unique identifier per user, and then a series of identifying information about that user as individual columns such as:
- First name
- Last name
- Counter of Site Visits
- Paid Account Identifier
This data can then be used to correlate or count, paid users, common interests, etc. via a series of SQL queries against the row/column structure of the users table.
Riak, in contrast, provides flexibility in how this data can be modeled based upon the application use case. It may be desirable to create a Users bucket, with the UserName (or Unique Identifier) as the key and a JSON object storing all user attributes as the value. Or, as we describe in Data Modeling with Riak Data Types, leverage the power of Riak Data Types by creating a map type for each user storing:
- first and last name strings in the register type,
- interests as a set,
- a counter for visits,
- and a flag for paid account identifier.
One of the best ways to enable application interaction with objects (a key/value pair) in Riak is to provide structured bucket and key names for the objects. This approach often involves wrapping information about the object in the object’s location data itself.
For example, appending a timestamp, UUID, or Geographical coordinate, to a key’s name allows for fine grained queryability via simple lookup to locate and retrieve a specific set of information. Leveraging the same naming mechanism as created for users (UniqueID as the key) enables, in a separate sessions bucket, storing the UUID append with a timestamp as the key and the session data (in binary format) as the object. In this way, using the same UUID, I am able to obtain both user and session data stored in different buckets and in different formats.
For additional information, and more complex considerations such as modeling relationship and advanced social applications, see the Riak documentation on use cases and data modeling.
Resolving Data Conflicts
In any system that replicates data, conflicts can arise – e.g., if two clients update the same object at the exact same time or if not all updates have yet reached hardware that is experiencing lag. Riak is “eventually consistent” – while data is always available, not all replicas may have the most recent update at the same time, causing brief periods (generally on the order of milliseconds) of inconsistency while all state changes are synchronized.
However, Riak does provide features to detect and help resolve the statistically small number of incidents when data conflicts occur. When a read request is performed, Riak looks up all replicas for that object. By default, Riak will return the most updated version, determined by looking at the object’s vector clock. Vector clocks are metadata attached to each replica when it is created. They are extended each time a replica is updated to keep track of versions. Clients can also be allowed to resolve conflicts themselves.
Further, when an outdated object is discovered as part of a read request, Riak will automatically update the out-of-sync replica to make it consistent. Read Repair, a self-healing property of the database, will even update a replica that returns a “not_found” in the event that a node loses it due to physical failure.
Riak also features “Active Anti-Entropy,” which is an automatic self-healing property that runs in the background. Rather than waiting for a read request to trigger a replica repair (as with Read Repair), Active Anti-Entropy constantly uses a hash tree exchange to compare replicas of objects and automatically repairs or updates any that are divergent, missing, or corrupt. This can be beneficial for large clusters storing “stale” data.
More information on vector clocks, dotted version vectors, and conflict resolution can be found in the online documentation in the section regarding Causal Context.
Multi-site replication is quickly becoming critical for many of today’s platforms and applications. Not only does replication across multiple clusters provide geographic data locality – the ability to serve global traffic at low-latencies – it can also be an integral part of a disaster recovery or backup strategy. Other teams may use multi-site replication to maintain secondary data stores, both for failover as well as for performing intensive computation without disrupting production load. Multi-site replication is included in Basho’s commercial extension to Riak, Riak Enterprise, which also includes 24/7 support.
Multi-site replication in Riak works differently than the typical approach seen in the relational world, multi-master replication. In Riak’s multi-datacenter replication, one cluster acts as a “primary cluster.” The primary cluster handles replication request from one or more “secondary clusters” (generally located in datacenters in other regions or countries). If the datacenter with the primary cluster goes down, a secondary cluster can take over as the primary cluster. In this sense, Riak’s multi-datacenter capabilities are “masterless.”
In multi-datacenter replication, there are two primary modes of operation: full sync and real-time. In full sync mode, a complete synchronization occurs between primary and secondary cluster(s). In real-time mode, continual, incremental synchronization occurs – replication is triggered by new updates. Full sync is performed upon initial connection of a secondary cluster, and then periodically (by default, every 6 hours). Full sync is also triggered if the TCP connection between primary and secondary clusters is severed and then recovered.
Data transfer is unidirectional (primary->secondary). However, bidirectional synchronization can be achieved by configuring a pair of connections between clusters.
Full documentation for multi-datacenter replication in Riak Enterprise is available in the online documentation.
Modeling data in any non-relational solution requires a different way of thinking about the data itself. Rather than an assumption that all data cleanly fits into a structure of rows and columns, the data domain can be overlayed on the core Key/Value store (Riak) in a variety of ways. There are, however, distinct tradeoffs and benefits to understand.
Relational Databases have:
- Foreign keys and constraints
- Sophisticated query planners
- Declarative query language (SQL)
- A Key/Value model where the value is any unstructured data
- More data redundancy that provides better availability
- Eventual consistency
- Simplified query capabilities
- Riak Search
What you will gain:
- More flexible, fluid designs
- More natural data representations
- Scaling without pain
- Reduced operational complexity
For more information on Data Modeling, or to chat with a member of the Basho team on the topic, please request a Tech Talk.
January 6, 2015
If you have read about Riak, or seen a member of the Basho team present, you have probably heard the phrase “Your data is opaque to Riak.” While this is not, strictly, true with the inclusion of distributed Data Types in Riak 2.0, it was a phrase that hinted at the core structure of Riak itself.
Riak is a Key Value data store.
In a relational database, data is organized by tables that are separate and unique structures. Within these tables exist rows of data organized into columns. As such, interaction with the database is by retrieving or updating entire tables, individual rows, or a group of columns within a set of rows.
In contrast, Riak has a much simpler data model. An Object is both the largest and smallest element of data. As such, interaction with the database is by retrieving or modifying the entire object. There is no partial fetch or update of the data.
Keys in Riak are simply a binary value (or a string) that are used to identify Objects. The Key/Value pair (or Object) is stored in a higher level namespace called a Bucket. And, with Riak 2.0, there is an extra layer of abstraction known as Bucket Types.
This Key/Value/Bucket model enables broad flexibility in modeling the applications data domain with Riak as the data store for persistence.
Another NoSQL model that many are familiar with is the document store. Unlike the Key/Value model the data store is aware of the structure of the objects stored. These objects, or documents, are grouped into “collections” — which is analogous to a relational “table” — and the datastore provides a query mechanism to search collections for objects with particular attributes. When the data that is being persisted is easily rendered as a JSON document, a document store can seem a natural fit. Some common use cases include product catalog data and content management systems.
The Basho Docs have a lengthy tutorial entitled Using Riak as a Document Store that walks you through the process of leveraging Riak as a document store for a CMS. There are many approaches to modeling, but the tutorial demonstrates the power of Riak 2.0 features by combining the maps data type and indexing that data with Riak Search.
When the data you are persisting can be represented as JSON, and you require the ability to query the data, Riak 2.0 is an excellent solution for persisting and modeling document data. The flexibility of the Key/Value model, combined with the power of Riak Search and Riak Data Types, provide you with a highly scalable, highly available document store with rich, full-text query capabilities. In addition, the inclusion of the maps data type means that you don’t have to write complex client side resolution logic when faced with network partitions. Riak Data Types handle that conflict resolution automatically.
A scalable, available document store that is operationally simple may seem compelling enough to use Riak. But when you combine the characteristics of Riak with the multi-datacenter replication capabilities of Riak Enterprise, now you have a solution that enables you to bring your data operations closer to the end user.
Scalable, available, operationally simple, and replicated. That’s the power of using Riak as a document store.
December 30, 2014
At Basho, we are proud of our documentation. All design, updates, and edits are done with our community top of mind and we encourage community participation. Given the pace at which our documentarian expert, Luc Perkins, is updating the content, it can be easy to fall behind in reading new and updated materials. So we have a holiday gift to help you out.
Below is our Top 10 suggested New Year’s reading list.
#10 – A Migrating from an SQL Database to Riak tutorial can help prepare you as embrace a new style of development and persistence.
#7 – Strong consistency has gone from having light documentation to being one of our best-documented open-source features. Strong Consistency docs are spread across the following:
#6 – We now have client-side security docs! There’s an introductory doc that walks you a bit through how client security works in Riak as well as client-specific docs for Java, Ruby, Python, and Erlang.
#5 – A new Erlang VM Tuning doc. This is still a work in progress. As we said at the beginning, we really encourage community involvement. What tuning have you done to optimize your Erlang environment?
In addition to the above, there is new documentation on the topics below.
Drum roll please….
#1 – Riak 2.0 – if you missed this you missed a lot.
We want to thank everyone in the community who participates in making the Basho documentation the most useful set of materials possible. Remember: to submit issues is human, to submit PRs is divine.
Happy New Year!
New, enhanced database and growing number of customers highlight strong year for the company
LONDON, UK. – November 20, 2014 – Basho, the creator and developer of Riak, the industry leading distributed NoSQL database, has seen a surge in deployment and a growing customer-base in EMEA as a result of the launch of Riak 2.0, the significantly enhanced version of its flagship platform.
2014 has seen significant successes for Basho, from the release of Riak 2.0 to news that Basho technology is powering Spine 2, the electronic backbone of the NHS. Basho has also seen strong growth in its EMEA customer-base, with the company working with businesses such as bet365, one of the world’s leading online gambling groups, StatPro, the cloud-based portfolio analysis service, and EE, the largest mobile operator in the United Kingdom to address their critical unstructured data needs. Basho has increased its number of customers in EMEA by 38 percent year on year, and these customer wins have contributed to revenue growth from Q2 to Q3 in 2014, which was up 90 percent.
“Our decision to implement Riak was purely strategic. After a stringent evaluation process we decided that Basho’s flexible, scalable database was best-suited to our needs,” said Martin Davies, Chief Executive Officer, Technology at bet365. “Given the huge amount of data we process on a daily basis – from customer details to betting odds – it was imperative that we had a platform to support this. We selected Riak, and have not been disappointed with the results.”
The gaming industry is becoming increasingly complex, with customers no longer satisfied with betting on a limited selection of outcomes. Now, gaming companies must offer more than your traditional betting options. For example, during football matches, it is no longer enough to offer odds on scorer or full-time result. Instead consumers are eager to bet on everything from the number of yellow cards, to corners and amount of injury time. To offer and process these options requires a huge amount of data-crunching, and in addition to the vast number of metrics and numbers processed when taking into account everything from betting odds, bets placed and the final action on each account, such businesses require a lightning-fast database to support the deluge and prevent system crashes.
Basho’s growing stature in the gaming sector has been matched by its recent success in the telecommunications space. An increasing number of telco companies like EE are using Riak to replace existing systems and provide fault-tolerance and scalability for the future. Riak’s strength in the industry is further highlighted by the market trend towards reducing the burden of managing complex hardware environments by providing a consolidated virtualized orchestration platform to replace much of the traditional hardware.
These recent deals highlight a strong year for Basho, while the reseller partnership with Nordicmind and its upcoming Riak Nordic Roadshow demonstrate its growing success in EMEA. Success in the region is further reflected in the appointment of Emmanuel Marchal as Managing Director EMEA, who will be leading enterprise focus in EMEA, as well as the continued work with companies such as Deutsche Vermögensberatung (DVAG), Germany’s largest stand-alone financial services distributor. The financial advisors of DVAG support over 6 million customers in all questions concerning financial planning, insurance and finances.
“We knew that with the release of Riak 2.0, 2014 would be a massive year for the company,” said Adam Wray, President and Chief Executive Officer at Basho. “However, the growth in deployment and the continued success of Riak was more significant than we expected – with customers responding in kind. This year alone we have made strides in several sectors, including telco, financial, gaming and healthcare, where we have helped complete a project with the NHS that could potentially save lives. Couple this with our growing number of partners, and we can happily say that Basho is going from strength-to-strength.”
By: Peter Coppola
We had the opportunity to stop by DATAVERSITY’S NoSQL Now! conference in San Jose last week. I was very impressed with the level of energy and the wide-ranging selection of sessions offered. According to Tony Shaw, the CEO of DATAVERSITY, the organizer of NoSQL Now, registrations were up 15 percent from 2013.
The exhibition hall was packed and lively as attendees jostled between booths. DATAVERSITY did an outstanding job keeping the show floor tightly packed with exhibitors. The industry was well represented by Cloudera (saw “Data is the new bacon” t-shirts), MarkLogic, MongoDB, Oracle and EnterpriseDB – all present as major sponsors. Between conversations, I was able to nab a nifty versatile screw-driver disguised as a pen from DataStax.
NoSQL Now sessions do rely heavily on sponsors, but with such a wide selection of tracks there’s bound to be a topic of interest at any given time slot. I had a choice of the following concurrent sessions at 4:15 p.m. on Wednesday:
- Internet of Things with MongoDB – MongoDB
- Out with MapReduce, In with Spark – DataStax
- Case Studies in Search and Semantics – MarkLogic
- Just the Right Weather for our Company: How We Chose Our Data Stores – The Weather Company
- NoSQL on ACID – EnterpriseDB
I attended The Weather Company’s session – not only was it the only non-vendor presenter, but the company is also a customer and big fan of Riak. The Weather Company manages five data centers that in production handle 25,000 requests per second and distribute 60 GB of data to each data center every 10 minutes. Surya Kangeyan Sivakumar took us through the journey of how The Weather Company selected its data store solutions and how it overcame the mindset of having to use its existing relational database solution just because the company had invested so much in it. Riak was selected, along with other NoSQL solutions, due to the speed and ease at which it could be stood up.
In 2015 Basho looks forward to being a more active participant in NoSQL Now.
By: Jeremy Hill
Business Intelligence makes it possible for organizations to make sense of the vast amount of customer, manufacturing and competitive information they have available in order to make smarter and better informed decisions. In turn, this enables organizations to become more responsive to customer needs, increase efficiencies in manufacturing processes, and respond to significant events quickly.
Historically the data that drives business intelligence has been stored in structured formats in a data warehouse, such as customer information on how much is spent. However, this approach misses out on the value of semi-unstructured and unstructured data, like the details from a customer call or a customer tweet.
With such information missing, a complete view of the customer or business can be limited. The consequence is that an inability to gain knowledge and measure customer information means businesses can fall behind, especially in a competitive market.
Business Intelligence needs NoSQL
Having access to all types of relevant customer information – structured, semi-structured and unstructured – is an essential requirement for business intelligence (BI) to help enterprises get ahead of the competition. Unlike structured, relational data warehouses, NoSQL databases make this possible with improved availability, scalability and fast response times. NoSQL databases are ideal for BI and data warehousing not only because of the diverse types of information it can deal with, but also because they are able to deliver data at the very time it is needed.
Enabling real-time analytics
NoSQL keeps up with transaction speeds as-it-happens, enabling real-time analytics. E-commerce transactions, for example, benefit from a NoSQL database because it can make a decision about what to do next when a buyer doesn’t complete a purchase. Instead of waiting 24 hours or longer for the data to move through a traditional data warehouse system, with a NoSQL system a feed goes straight from a transaction through a connecter to a NoSQL database. A sales analytics process can make a decision with the intelligence at that very minute, to consult the customer and understand the behavior in real-time, helping secure the purchase and preventing the loss of a customer transaction.
A recently announced Basho partner, Caserta Concepts, a technology consulting firm specializing in big data analytics, data warehousing and business intelligence, works with CIOs to deliver analytics solutions that support business goals. It uses Riak and Riak CS to accommodate unique client requirements across a broad range of data types – structured, semi-structured and unstructured – and provides continuous availability to keep critical line-of-business applications going around the clock. Caserta’s practice illustrates the viability for NoSQL in the database revolution to take on the volume, variety and velocity of data dynamics of today’s web-scale applications.
Intelligence for IoT transactions
With the vast amounts of information from Internet of Things (IoT) technologies, more business intelligence needs and use cases are at the cusp. Consider oil and gas organizations providing annual service contracts for boilers – analytics tells the business that anything beyond the second call out (or truck roll) wipes out the profit on the contract. In the connected world, NoSQL enables the next level of intelligence, which allows organizations to collect information so that, in the event of failure, they are able to determine which parts are needed in advance, eliminating the need for multiple visits. Gathering intelligence from this data also allows organizations to perform preemptive maintenance during the annual inspections to lower the frequency of unplanned, costly site visits.
With NoSQL, BI and data warehousing can become quicker and much more efficient. It allows organizations to react to events more quickly, increase customer attention, streamline the supply chain, predict customer behavior at the point it matters and predict future service calls. At the rise of big, unstructured data, NoSQL presents enormous opportunity for the future of business intelligence.
March 24, 2014
When selecting a NoSQL solution, there are many options to choose from, each different and with their own benefits depending on your use case. To help you decide what the right choice for your needs may be, there are two amazing events this week where many NoSQL providers (including Basho) will be speaking.
The first is in conjunction with Ad:Tech in San Francisco. For advertisers to stay competitive in the modern landscape, the need to crunch massive amounts of consumer profile data and enable real-time bidding has made NoSQL the gold standard in database technology. That’s why Basho partner, GoGrid, will be hosting the panel, “NoSQL: Digital Advertising’s ‘Bad Boy’ Database Comes of Age” at 111 Minna Gallery. Speakers from Basho, Couchbase, DataStax, and MongoDB will be there to discuss how NoSQL is helping advertisers push the envelope now, and what is to come in 2014. This panel will take place on Wednesday, March 26th at 5:30pm. Registration is free and tickets are still available.
The other is hosted by the New York Software Engineers. This meetup, “The Battle of Distributed Databases – Data Modeling in the Enterprise Ecosystem,” will address some of the challenges the NoSQL community faces in enterprise adoption. Casey Rosenthal, Director of Professional Services at Basho, will be speaking about Riak and its adoption with 30% of the Fortune 50. This meetup will take place on Wednesday, March 26th at 7pm at Foursquare’s office. Be sure and register for this free event.
To see how enterprises are using Riak, check out the Users Page.
In addition to these meetups, Basho will be at multiple other events and conferences. A complete list can be found on our Events Page.
December 23, 2013
A few weeks ago, we hosted a webinar with 451 Research entitled, “Beyond NoSQL – Distributed Databases in Production.” This webinar featured Matt Aslett (Research Director at 451 Research), Bobby Patrick (EVP and CMO at Basho Technologies), and Wes Jossey (Systems Engineer at Tapjoy).
During this one-hour webinar, we discuss the history of NoSQL, the current NoSQL landscape, and then dive into Basho’s Riak. Wes Jossey also presents a case study from Riak User, Tapjoy, about how they use Riak as the cornerstone of their data management strategy. Finally, we wrap up with a look at what’s to come with Riak 2.0.
If you weren’t able to attend this webinar (or would like to rewatch it), the recording is now available. Simply register here to receive a link to watch the recording: info.basho.com/BeyondNoSQL_Recorded.html
December 18, 2013
Downtime, planned or unplanned, is no longer an option. It can have a dramatic impact on revenue and lead to negative customer experiences and attrition. Luckily, distributed NoSQL databases (such as Basho Riak) are designed to provide high availability, even during network partition or server failure. This means there will never be an excuse for downtime again.
To help demonstrate the cost of downtime and how Riak can help, we have put together an infographic, “Down With Downtime.” Zoom in by clicking the image below.
December 9, 2013
Tomorrow (December 10th) at 10am PT/1pm ET, we will be hosting a live webinar, “Beyond NoSQL – Distributed Databases in Production.” This webinar will feature Matt Aslett (Research Director at 451 Research), Bobby Patrick (EVP and CMO at Basho Technologies), and Wes Jossey (Systems Engineer at Tapjoy). There are still seats available, and you can register here for more details.
This webinar will talk about the history of NoSQL and what issues NoSQL aimed to solve in regard to relational systems. It will then look at the current NoSQL landscape and architecture trends. From there, the webinar will focus on Basho’s Riak, a distributed NoSQL database, and some of its key features and use cases. Tapjoy, the mobile performance-based advertising platform (and Riak user) will discuss how they use Riak to provide reliable data locality to their customers and why they selected Riak to be the cornerstone of their data management strategy. Finally, it will wrap up with a look at what’s to come with Riak 2.0 and have a live question and answer session.
Be sure and register now for “Beyond NoSQL – Distributed Databases in Production.”