March 24, 2014
When selecting a NoSQL solution, there are many options to choose from, each different and with their own benefits depending on your use case. To help you decide what the right choice for your needs may be, there are two amazing events this week where many NoSQL providers (including Basho) will be speaking.
The first is in conjunction with Ad:Tech in San Francisco. For advertisers to stay competitive in the modern landscape, the need to crunch massive amounts of consumer profile data and enable real-time bidding has made NoSQL the gold standard in database technology. That’s why Basho partner, GoGrid, will be hosting the panel, “NoSQL: Digital Advertising’s ‘Bad Boy’ Database Comes of Age” at 111 Minna Gallery. Speakers from Basho, Couchbase, DataStax, and MongoDB will be there to discuss how NoSQL is helping advertisers push the envelope now, and what is to come in 2014. This panel will take place on Wednesday, March 26th at 5:30pm. Registration is free and tickets are still available.
The other is hosted by the New York Software Engineers. This meetup, “The Battle of Distributed Databases – Data Modeling in the Enterprise Ecosystem,” will address some of the challenges the NoSQL community faces in enterprise adoption. Casey Rosenthal, Director of Professional Services at Basho, will be speaking about Riak and its adoption with 30% of the Fortune 50. This meetup will take place on Wednesday, March 26th at 7pm at Foursquare’s office. Be sure and register for this free event.
To see how enterprises are using Riak, check out the Users Page.
In addition to these meetups, Basho will be at multiple other events and conferences. A complete list can be found on our Events Page.
December 23, 2013
A few weeks ago, we hosted a webinar with 451 Research entitled, “Beyond NoSQL – Distributed Databases in Production.” This webinar featured Matt Aslett (Research Director at 451 Research), Bobby Patrick (EVP and CMO at Basho Technologies), and Wes Jossey (Systems Engineer at Tapjoy).
During this one-hour webinar, we discuss the history of NoSQL, the current NoSQL landscape, and then dive into Basho’s Riak. Wes Jossey also presents a case study from Riak User, Tapjoy, about how they use Riak as the cornerstone of their data management strategy. Finally, we wrap up with a look at what’s to come with Riak 2.0.
If you weren’t able to attend this webinar (or would like to rewatch it), the recording is now available. Simply register here to receive a link to watch the recording: info.basho.com/BeyondNoSQL_Recorded.html
December 18, 2013
Downtime, planned or unplanned, is no longer an option. It can have a dramatic impact on revenue and lead to negative customer experiences and attrition. Luckily, distributed NoSQL databases (such as Basho Riak) are designed to provide high availability, even during network partition or server failure. This means there will never be an excuse for downtime again.
To help demonstrate the cost of downtime and how Riak can help, we have put together an infographic, “Down With Downtime.” Zoom in by clicking the image below.
December 9, 2013
Tomorrow (December 10th) at 10am PT/1pm ET, we will be hosting a live webinar, “Beyond NoSQL – Distributed Databases in Production.” This webinar will feature Matt Aslett (Research Director at 451 Research), Bobby Patrick (EVP and CMO at Basho Technologies), and Wes Jossey (Systems Engineer at Tapjoy). There are still seats available, and you can register here for more details.
This webinar will talk about the history of NoSQL and what issues NoSQL aimed to solve in regard to relational systems. It will then look at the current NoSQL landscape and architecture trends. From there, the webinar will focus on Basho’s Riak, a distributed NoSQL database, and some of its key features and use cases. Tapjoy, the mobile performance-based advertising platform (and Riak user) will discuss how they use Riak to provide reliable data locality to their customers and why they selected Riak to be the cornerstone of their data management strategy. Finally, it will wrap up with a look at what’s to come with Riak 2.0 and have a live question and answer session.
Be sure and register now for “Beyond NoSQL – Distributed Databases in Production.”
November 27, 2013
Join Basho and 451 Research on Tuesday, December 10th at 10am PT for a live webinar, “Beyond NoSQL – Distributed Databases in Production.”
This webinar will feature Matt Aslett, Research Director at 451 Research, and Bobby Patrick, EVP and CMO at Basho Technologies. This webinar will set the stage with NoSQL trends and adoption across various industries. It will then discuss some of the key benefits of distributed NoSQL systems and explore how systems like Riak are evolving.
Wes Jossey, Systems Engineer at Tapjoy, will also be joining the webinar to discuss how Tapjoy uses distributed databases to provide reliable data locality to their customers through multi-datacenter replication.
Register here for the free “Beyond NoSQL – Distributed Databases in Production” webinar.
October 28, 2013
The technology community is extremely agile and fast-paced. It can turn on a dime to solve business problems as they arise. However, with this agility comes budding terminology that can often provide false categorizations. This can lead to confusion, especially when companies evaluate new technologies based on a surface understanding of these terms. The world of data is full of these terms, including the notorious “NoSQL” and “big data.”
As described in a previous post, NoSQL is a misleading term. This term represents a response to changing business priorities that require more flexible, resilient architectures (as opposed to the traditional, rigid systems that often happen to use SQL). However, within the NoSQL space, there are dozens of players that can be as different from one another as they are from any of the various SQL-speaking systems.
Big data is another term that, while fairly self-explanatory, has been overused to the point of dilution. One reason why NoSQL databases have become necessary is because of their ability to easily scale to keep up with data growth. Simply storing a lot of data isn’t the solution though. Some data is more critical than others (and should be accessible no matter what) and some data needs to be analyzed to provide business insights. When digging into a business, big data is too vague a term to describe both of these use cases.
As these terms (to highlight a few) are used, it can lead to industry confusion. One area of confusion that we have experienced relates to Basho’s own distributed database, Riak, and the distributed processing system, Hadoop.
While these two systems are actually complementary, we are often asked “How is Riak different from Hadoop?”
To help explain this, it’s important to start with a basic understanding of both systems. Riak is a distributed database that is built for high availability, fault tolerance, and scalability. It is best used to store large amounts of critical data that applications and users need to constantly be able to access. Riak is built by Basho Technologies and can be used as an alternative to or in conjunction with relational databases (such as MySQL) or to other “NoSQL” databases (such as MongoDB or Cassandra).
Hadoop is a framework that allows for the distributed parallel processing of large data sets across clusters of computers. It was originally based on the “MapReduce” system, which was invented by Google. Hadoop consists of two core parts: the underlying Hadoop Distributed File System (HDFS), which ensures stored data is always available to be analyzed, and MapReduce, which allows for scalable computation by dividing and running queries over multiple machines. Hadoop provides an inexpensive, scalable solution for bulk data processing and is mostly used as part of an overarching analytics strategy, not for primary “hot” data storage.
One easy way to distinguish between the two is to look at some of the common use cases.
Riak Use Cases
Riak can be used by any application that needs to always have access to large amounts of critical data. Riak uses a key/value data model and is data-type agnostic, so operators can store any type of content in Riak. Due to the key/value model, certain industry use cases fit easily into Riak. These include:
- Gaming – storing player data, session data, etc
- Retail – underpinning shopping carts, product inventories, etc
- Mobile – social authentication, text and multimedia storage, global data locality, etc
- Advertising – serving ad content, session storage, mobile experiences, etc
- Healthcare – prescription or patient records, patient IDs, health data that must always be available across a network of providers, etc
For a full list of use cases, check out our Users Page.
Hadoop Use Cases
Hadoop is designed for situations where you need to store unmodeled data and run computationally intensive analytics over that data. The original use cases of both MapReduce and Hadoop were to produce indexes for distributed search engines at Google and Yahoo respectively. Any industry that needs to do large scale analytics to better improve their business can use Hadoop. Some common examples include finance (build models to do accurate portfolio evaluations and risk analysis) and eCommerce (analyze shopping behavior to deliver product recommendations or better search results).
Riak and Hadoop are based on many of the same tenets, making their usage complementary for some companies. Many companies that utilize Riak today have created scripts, or processes, to pull data from Riak and push into other solutions (like Hadoop) for the purpose of historical archiving or future analysis. Recognizing this trend, Basho is exploring the creation of additional tools to simplify this process.
If you are interested in our thinking on these data export capabilities, please contact us.
Every tool has its value. Hadoop excels at being used by a relatively small subset of the business to answer big questions. Riak excels at being used by a very large number of users and powering critical data for businesses.
August 20, 2013
NoSQL is a misleading name. SQL was never the problem. However, this poorly named industry term does represent a response to changing business priorities and new challenges that require different kinds of database architectures.
Traditional database architectures were first developed in the late 60s and early 70s. They were the default option for many pre-Internet use cases and remain useful today for certain use cases requiring a relational data model. However, their limits are painfully apparent to many companies. Despite what traditional database vendors might have us believe, very little data generated today actually requires a SQL architecture. Businesses face many new challenges today that traditional databases simply are not designed to handle reliably or efficiently. These include:
- Global Users. It is no longer enough to provide a fast experience in one country. Users from all over the globe expect a low-latency experience, making geo-data locality more important than ever.
- Zero Downtime. Planned and unplanned. Both are bad for business. There is now an expectation for always-on availability. Operations teams emphasize must resiliency over recovery.
- Scale Matters. Businesses need to scale up quickly to meet peak loads during the holidays or product launches, and then they need to scale back down. They need an architecture that makes scaling the least of their worries.
- Flexible Data. From user generated data to machine-to-machine (M2M) activity, unstructured data is now commonplace. Businesses need flexibility to handle all the data generated and flowing.
- Omnichannel. Whether users are on a tablet, laptop, or smartphone, they require a device agnostic experience and low-latency.
- Amazon Economics. Every business wants Amazon Economics. With the nature of data growth today, businesses can’t afford expensive machines at every juncture. They need commodity machines to scale horizontally, not vertically.
Attempts to address these challenges with traditional databases result in an inflexible architecture with super high costs. “NoSQL” databases represent a fresh approach towards building flexible, resilient architectures. “NoSQL” goes where no database has ever gone before — into the wild space of the Internet and the massive scale requirements it represents.
Which brings us to NoSQL Now! Basho is sponsoring because the movement is more important than any single industry term. Andy Gross will also be on-hand to further discuss the larger trend of distributed systems:
Dealing with Systems in a New Distributed World
Chief Architect and Co-creator of Riak
Thursday, August 22, 2013
Please join us in San Jose for a look at the future of database technology.
At NoSQL NOW!, industry leaders and developers come together to share ideas at the largest vendor agnostic event of disruptive NoSQL technologies for infrastructure architecture.
Basho has a strong and salient presence at this conference. Both their open source distributed database, Riak, and cloud storage software, Riak CS, are NoSQL technologies that offer high availability, fault-tolerance, and operational ease-of-use. Unlike traditional databases, Riak automatically distributes data in the cluster, eliminating the need for manual sharding, and its masterless design means that nodes can fail without bringing down the entire system. Due to this architecture, Riak has become foundational to many of the world’s fastest-growing Web-based, mobile and social applications like Comcast, Voxer, and Best Buy. The Basho team will be available at the conference to answer questions about Riak and how to move from a relational system to a distributed one.
In addition to exhibiting, Basho Chief Architect and Co-Creator, Andy Gross, will be speaking on August 22 at 3:00pm. His talk, “Dealing with Systems in a New Distributed World” will discuss the resurgence in interest of both theoretical and applied distributed systems and its consequences for software developers. He will explore new areas of promising research, and provide practical advice for dealing with systems in our new distributed world. Finally, he will discuss how technologies are shifting to meet emerging business requirements, while simultaneously minimizing immediate operational burdens and enabling ease of scale.
Andy is a distributed systems nerd, co-creator of Riak and Webmachine, and Chief Architect at Basho Technologies. Before Basho, Andy hacked on various distributed systems at Apple, Akamai, and Mochi Media.
For more information about Riak, common use cases, and an in-depth analysis of the benefits of migrating to a distributed NoSQL database, download “From Relational to Riak.”
July 3, 2013
Basho CTO, Justin Sheehy, recently participated in a “Not Only SQL Summit,” alongside executives from some of the top NoSQL vendors. This summit was moderated by Ted Neward of Neward & Associates LLC and discussed the evolution of NoSQL systems as well as some associated best practices. It also included insights from customers currently using these NoSQL solutions.
In addition to Justin Sheehy, panelists included:
- Anthony Molinaro, Infrastructure Architect at OpenX, discussing how they use Riak
- Patrick McFadin, Principal Solution Architect at DataStax
- Michael Kjellman, Software Engineer at Barracuda Networks, discussing how they use Cassandra
- Justin Weiler, CTO at FatCloud
- Attinder Khalsa, Executive Software Architect at Wilshire Axon, discussing how they use FatDB
Throughout this summit, OpenX, Barracuda Networks, and Wilshire Axon discussed not only why they chose to move away from relational systems but also why they chose the NoSQL vendor that they did. They also talk about their experiences dealing with eventual consistency and schemaless data. You can view the full summit below:
April 17, 2013
This post looks at five commonly asked questions about Riak. For more questions and answers, check out our Riak FAQ.
What hardware should I use with Riak?
Riak is designed to be run on commodity hardware and is run in production on a variety of different server types on both private and public infrastructure. However, there are several key considerations when choosing the right infrastructure for your Riak deployment.
RAM is one of the most important factors – RAM availability directly affects what Riak backend you should use (see question below), and is also required for complex MapReduce queries. In terms of disk space, Riak automatically replicates data according to a configurable n_val. A bucket-level property that defaults to 3, n_val determines how many copies of each object will be stored, and provides the inherent redundancy underlying Riak’s fault-tolerance and high availability. Your hardware choice should take into consideration how many objects you plan to store and the replication factor, however, Riak is designed for horizontal scale and lets you easily add capacity by joining additional nodes to your cluster. Additional factors that might affect choice of hardware include IO capacity, especially for heavy write loads, and intra-cluster bandwidth. For additional factors in capacity planning, check out our documentation on cluster capacity planning.
Riak is explicitly supported on several cloud infrastructure providers. Basho provides free Riak AMIs for use on AWS. We recommend using large, extra large, and cluster compute instance types on Amazon EC2 for optimal performance. Learn more in our documentation on performance tuning for AWS. Engine Yard provides hosted Riak solutions, and we also offer virtual machine images for the Microsoft VM Depot.
What backend is best for my application?
Riak offers several different storage backends to support use cases with different operational profiles. Bitcask and LevelDB are the most commonly used backends.
Bitcask was developed in-house at Basho to offer extremely fast read/write performance and high throughput. Bitcask is the default storage engine for Riak and ships with it. Bitcask uses an in-memory hash-table of all keys you write to Riak, which points directly to the on-disk location of the value. The direct lookup from memory means Bitcask never uses more than one disk seek to read data. Writes are also very fast with Bitcask’s write-once, append-only design. Bitcask also offers benefits like easier backups and fast crash recovery. The inherent limitation is that your system must have enough memory to contain your entire keyspace, with room for a few other operational components. However, unless you have an extremely large number of keys, Bitcask fits many datasets. Visit our documentation for more details on Bitcask, and use the Bitcask Capacity Calculator to assist you with sizing your cluster.
LevelDB is an open-source, on-disk key-value store from Google. Basho maintains a version of LevelDB tuned specifically for Riak. LevelDB doesn’t have Bitcask’s memory constraints around keyspace size, and thus is ideal for deployments with a very large number of keys. In addition to this advantage, LevelDB uses Google Snappy data compression, which provides particular efficiency for text data like raw text, Base64, JSON, HTML, etc. To use LevelDB with Riak, you must the change the storage backend variable in the app.config file. You can find more details on LevelDB here.
Riak also offers a Memory storage backend that does not persist data and is used simply for testing or small amounts of transient state. You can also run multiple backends within a single Riak instance, which is useful if you want to use different backends for different Riak buckets or use a different storage configuration for some buckets. For in-depth information on Riak’s storage backends, see our documentation on choosing a backend.
How do I model data using Riak’s key/value design?
Riak uses a key/value design to store data. Key/value pairs comprise objects, which are stored in buckets. Buckets are flat namespaces with some configurable properties, such as the replication factor. One frequent question we get is how to build applications using the key/value scheme. The unique needs of your application should be taken into account when structuring it, but here are some common approaches to typical use cases. Note that Riak is content-agnostic, so values can be any content type.
|Session||User/Session ID||Session Data|
|Content||Title, Integer||Document, Image, Post, Video, Text, JSON/HTML, etc.|
|Advertising||Campaign ID||Ad Content|
|Sensor||Date, Date/Time||Sensor Updates|
|User Data||Login, Email, UUID||User Attributes|
For more comprehensive information on building applications with Riak’s key/value design, view the use cases section of our documentation.
What other options, besides strict key/value access, are there for querying Riak?
Most operations done with Riak will be reading and writing key/value pairs to Riak. However, Riak exposes several other features for searching and accessing data: MapReduce, full-text search, and secondary indexing.
Riak also provides Riak Search, a full-text search engine that indexes documents on write and provides an easy, robust query language and SOLR-like API. Riak Search is ideal for indexing content like posts, user bios, articles, and other documents, as well as indexing JSON data. For more information, see the documentation on Riak Search.
Secondary indexing allows you to tag objects in Riak with one or more queryable values. These “tags” can then be queried by exact or range value for integers and strings. Secondary indexing is great for simple tagging and searching Riak objects for additional attributes. Check out more details here.
How does Riak differ from other databases?
We often get asked how Riak is different from other databases and other technologies. While an in-depth analysis is outside the scope of this post, the below should point you in the right direction.
Riak is often used by applications and companies with a primary background in relational databases, such as MySQL. Most people who move from a relational database to Riak cite a few reasons. For one, Riak’s masterless, fault-tolerant, read/write available design make it a better fit for data that must be highly available and resilient to failure scenarios. Second, Riak’s operational profile and use of consistent hashing means data is automatically redistributed as you add machines, avoiding hot spots in the database and manual resharding efforts. Riak is also chosen over relational databases for the multi-datacenter capabilities provided in Riak Enterprise. A more detailed look at the difference between Riak and traditional databases and how to make the switch can be found in this whitepaper, From Relational to Riak.
A more detailed look at the technical differences between Riak and other NoSQL databases can be found in the comparisons section of our documentation, which covers databases such as MongoDB, Couchbase, Neo4j, Cassandra, and others.