April 9, 2014
Rubicon.IO is a threat intelligence start-up that has developed purpose built technology that delivers on the promise of Scale, Speed, and Accuracy in using Big Data. Rubicon offers real-time analytic capabilities by scouring metadata from various sources: threat feeds, social media, SIEM data, and PCAPs. It uses a purpose built HPC engine that aggregates and humanizes geospatial, TECHINT, HUMINT, and OSINT data sources. Rubicon provides the necessary context for businesses to respond to attacks appropriately in real-time – all delivered using advanced visualizations via a multi-dimensional user interface. To provide this intelligence, Rubicon needs to find and store large amounts of data and access that data in near real-time. To do this, they use Riak.
(An example of the Rubicon User Interface)
When Rubicon was first starting, they planned to use CouchDB as the original Proof of Concept. However, as they started testing CouchDB, they found that it couldn’t handle the scale of data that they needed to store and access. Its document-only model also meant that they were constantly updating documents, rather than scaling out with immutable data. Wes Brown, Founder and CTO at Rubicon, knew they needed to find something else and saw this as the perfect opportunity to finally use Riak.
“I have tested all of the NoSQL database offerings in the past and Riak was the only one that lived up to its promise,” said Wes. “All of them fell apart at some point, except for Riak. Riak is a fantastic key/value store that provides the scale and low-latency Rubicon needs.”
As mentioned, Rubicon uses an immutable data model, meaning once data is put in, it does not change. This prevents the expensive cycle of reading and then modifying writes. In Riak, Rubicon stores a key for every atomic observation or “fact.” These facts have subfields that have normalized names. This makes it very simple for Rubicon to search and index facts as needed, to return any that are related. For example, they might search for anything pertaining to a certain IP address to provide additional context to clients regarding an attack. By providing this context, it allows their clients to better understand the attack, who’s behind it, where it came from, and what the appropriate response is. All of this information is provided in real-time and they use Infiniband to provide microsecond performance.
(A portion of the visualization created from data collected in Riak)
Rubicon is currently about six months out from being in production with Riak. They are currently using the Riak 2.0 Technical Preview and will launch with Riak 2.0 GA. They are planning to launch with eight nodes and will scale up to 100 nodes to store their petabytes of data at low-latency.
“Riak has been a vital toolkit that helps us solve multiple problems, rather than just addressing one block problem,” says Wes. “By using Riak, we are able to take advantage of all the benefits and performance of a reliable key/value store, while continuing to build out our own functionality on top of it. We never need to worry about Riak, which invaluable for our business.”
For more information about Rubicon.IO, visit their site at www.rubicon.io
To see how other companies are using Riak, visit our Users Page.
December 11, 2013
The Basho team congratulates Covata on the successful launch of Covata Safe Share earlier today. Covata Safe Share is a document security solution that enables the guaranteed secured sharing of data to users on any device, anywhere, while also addressing the risk of data leakage and loss. Covata Safe Share embeds Riak as the highly available distributed database managing user, access-control, and document data. Over the past several months, Basho and Covata have collaborated together to address enterprise-class requirements for sensitive document data sharing.
Covata Safe Share is a web-based interface that guides users through the simple process of selecting a file to protect, assignment of authorized or ad hoc Collaborators, and defining the access controls that identify the conditions under which the file can be accessed. Originators can revoke access to any file at any time or change the access controls to adapt to evolving circumstances and requirements. Data can be stored on one or more cloud storage providers, giving the organization complete deployment flexibility. Organizations retain full ownership of the encryption keys, access controls and audit data, and can store these elements on-premises while moving secured data to the cloud. This prevents the cloud storage provider from gaining access to sensitive data.
The growth in cloud storage continues to accelerate. Basho is excited to continue to work with Covata to “protect data wherever it goes.”
Covata’s press release, Covata Introduces Safe Share For Securing Data In The Cloud, has additional detail on the launch.
October 29, 2013
Throughout RICON West, we will be discussing many of the Riak 2.0 features (both in track sessions or during lightning talks), so keep your eyes on the live stream over the next two days. Videos of all sessions will also be made available after the conference.
Here is a look at some of the major enhancements available in Riak 2.0:
- Riak Data Types. Building on the eventually consistent counters introduced in Riak 1.4, Riak 2.0 adds sets and maps as new distributed data types. These Riak Data Types simplify application development without sacrificing Riak’s availability and partition tolerance characteristics.
- Strong Consistency. Developers have the flexibility to choose whether buckets should be eventually consistent (the default Riak configuration today that provides high availability) or strongly consistent, based on data requirements.
- Full-Text Search Integration with Apache Solr. Riak Search is completely redesigned in Riak 2.0, leveraging the Apache Solr engine. Riak Search in 2.0 supports the Solr client query APIs, enabling integration with a wide range of existing software and commercial solutions.
- Security. Riak 2.0 adds the ability to administer access rights and utilize plug-in authentication models. Authentication and Authorization is provided via client APIs.
- Simplified Configuration Management. Riak 2.0 continues to improve Riak’s operational simplicity by changing how, and where, configuration information is stored in an easy-to-parse and transparent format.
- Reduced Replicas for Multiple Data Centers. Riak Enterprise 2.0 can optionally store fewer copies of replicated data across multiple data centers to better maintain a balance between storage overhead and availability.
Ready to get started? Download the Technical Preview.
Please note that this is only a Technical Preview of Riak 2.0. This means that it has been tested extensively, as we do with all of our release candidates, but there is still work to be completed to ensure it’s production hardened. Between now and the final release, we will be continuing manual and automated testing, creating detailed use cases, gathering performance statistics, and updating the documentation for both usage and deployment.
Riak 2.0 Technical Preview: Deep Dive
Riak Data Types
In distributed systems, we are forced to trade consistency for availability (see: CAP Theorem) and this can complicate some aspects of application design. In Riak 2.0, we have integrated cutting-edge research on data types known as called CRDTs (Conflict-Free Replicated Data Types) pioneered by INRIA to create Riak Data Types. By adding counters, sets, maps, registers, and flags, these Riak Data Types enable developers to spend less time thinking about the complexities of vector clocks and sibling resolution and, instead, focusing on using familiar, distributed data types to support their applications’ data access patterns.
A more detailed overview of Riak Data Types is available that examines implementation considerations and the basics of usage.
In all prior versions, Riak was classified as an eventually consistent system. With the 2.0 release, Riak now lets developers choose when operations should be strongly or eventually consistent. This gives developers a choice between these semantics for different types of data. At the same time, operators can continue to enjoy the operational simplicity of Riak. Consistency preferences are defined on a per bucket type basis, in the same cluster.
A RICON West 2012 talk entitled, Bringing Consistency to Riak, shares much of the initial thinking behind this effort. In addition, the pull request that adds consistency to
riak_kv provides detailed information about related repositories and the implementation approach.
Redesigned Full-Text Search
Riak is a key/value store and the values are simply stored on disk as binary. With previous versions of Riak Search, Riak developers have long been able to index the content of these stored values. In Riak 2.0, Riak Search (code-named Yokozuna) has been completely redesigned and now uses the Apache Solr full-text document indexing engine directly. Together, Riak and Solr provide a reliable full-text context indexing solution that is highly available and built for scale. In addition, Riak Search 2.0 also fully supports the Solr client query APIs, which enables integration with existing software solutions (either homegrown or commercial).
The Basho engineers responsible for Yokozuna have created a resources page that includes recorded talks, Solr documentation links, and books on the topic.
Basho designed Riak with critical data in mind. Whether it’s data that affects revenue, user experience, or even a patient’s health (as is the case with the NHS), Riak ensures that this critical data is always available. However, often this critical data is also sensitive data. Riak 2.0 adds security to this data through the ability to administer access rights and plug-in various secure authentication models commonly used today.
The initial RFC that describes the security effort, including related Pull Requests, is available at github.com/basho/riak/issues/355.
Simplified Configuration Management
At Basho, we pride ourselves on providing operationally friendly software that functions smoothly when dealing with the challenges of a distributed system. In the past, configuration of Riak occurred in two files:
vm.args. Riak 2.0 changes how and where configuration information is stored. It no longer uses Erlang-specific syntax but, rather, provides a layout more suited for all operators and automated deployment tools. This layout is easy to parse and transparent for Riak administrators.
More information on the vision and specific implementation considerations are contained in the repository at github.com/basho/cuttlefish.
In versions of Riak prior to 2.0, keys were made up of two parts: the bucket they belong to and a unique identifier within that bucket. Buckets act as a namespace and allow for similar keys to be grouped. In addition, they provide a means of configuring how previous versions of Riak treated that data.
In Riak 2.0, several new features (security and strong consistency in particular) need to interact with groups of buckets. To this end, Riak 2.0 includes the concept of a Bucket Type. In addition to allowing new features without special prefixes in Bucket names, Riak developers and operators are able to define a group of buckets that share the same properties and only store information about each Bucket Type, rather than individual buckets.
More information about Bucket Types can be found in the Github Issue at github.com/basho/riak/issues/362. This issue describes the planned functionality, discussions about implementation, and includes related pull requests.
Change in Defaults for Sibling Resolution
Riak has always supported both application-side and timestamp and vector clock-based Last Write Wins server-side resolution. Prior to Riak 2.0, vector clock-based Last Write Wins has been the default. Moving forward, new clusters will hand off siblings to applications by default. This is the safest way to work with Riak, but requires developers to be aware of sibling resolution.
More Efficient Use of Physical Memory
Riak nodes are designed to manage the changing demands of a cluster as it experiences network, hardware, and other failures. To do this, Riak balances each node’s resources accordingly. Riak 2.0 has vastly improved LevelDB’s use of available physical memory (RAM) by allowing local databases to dynamically change their cache sizes as the cluster fluctuates under load.
In the past, it was necessary to specify RAM allocation for different LevelDB caches independently. This is no longer the case. In Riak 2.0, LevelDB databases that manage key/value or active anti-entropy data share a single pool of memory, and administrators are free to allocate as much of the available RAM to LevelDB as they feel is appropriate in their deployment. Detailed implementation documentation can be found in the basho/leveldb wiki.
Riak Ruby Vagrant Project
If you are interested in testing Riak 2.0, in a contained environment with the Riak Ruby Client, Basho engineer Bryce Kerley has put together the Riak-Ruby-Vagrant repository. In addition, this environment can be easily adapted to usage with other clients for testing the new features of Riak 2.0.
January 24, 2013
Alert Logic, the industry-leading Security-as-a-Service provider and protector of customer infrastructure and data, uses Riak to help manage their massive amount of data collection needs and to support their rapid business growth.
Recently named a Leader in Emerging Managed Security Services by Forrester Research, Alert Logic helps companies defend against security threats and address compliance mandates, such as PCI and HIPAA. Alert Logic’s Security solutions include intrusion detection, web application security, log management and vulnerability assessment, coupled with 24×7 monitoring and expert guidance services. Alert Logic is used by dozens of the world’s largest hosting service providers.
With the help of Riak, Alert Logic collects and processes machine data and uses this information to perform real-time analytics, detect anomalies, ensure compliance and proactively respond to threats. Alert Logic introduced Riak in 2012 to support the development of a new analytics infrastructure, and ultimately replace an existing MySQL system that could not support the anticipated increase in workload.
The new analytics infrastructure performs statistical and correlation processing on all data collected from Alert Logic’s products – including log messages, network intrusion detection events, and NetFlow data – processing approximately 5 TB/day. All of this data is processed in real-time as it streams in from over 2,000 customers, 5,000 appliances, and hundreds of thousands of data sources on customer networks. The data grows more than 50% a year, outpacing revenue growth of 40%.
Today, Alert Logic’s analytics infrastructure, powered by Riak, achieves performance results of up to 35,000 operations/second across each node in the cluster – performance that eclipses the existing MySQL deployment by a large margin on single node performance. In real business terms, the initial deployment of the combination of Riak and the analytic infrastructure has allowed Alert Logic to process in real-time 7,500 reports, which previously took 12 hours of dedicated processing every night. In addition, Alert Logic’s expert security analysts’ benefited as well, by gaining increased functionality and efficiency.
Alert Logic uses Riak Enterprise advanced replication technology to deploy clusters that can handle different priority workloads. This frees up Alert Logic’s primary cluster to ensure it is always available to receive and write customer-specific analytic data, even during times requiring extreme scale. Other Riak clusters will provide data mining and reporting that are critical to Alert Logic’s solutions.
“Alert Logic depends on the reliable processing of massive amounts of machine data and turning that into actionable information,” said Paul Fisher, Director of Platform Services, at Alert Logic. “Our security operations center depends on this information for analysis to detect and respond to real-time security incidents that occur on our customers networks. We selected Riak for scalability and fault-tolerance, and it continues to be a vital component helping ensure that the Alert Logic Platform can scale to keep up with our rapid growth.”
Alert Logic plans to accelerate development of its real-time analytical capabilities in 2013, and expand the presence of Riak as a foundational technology throughout Alert Logic’s solutions. On deck next is the replacement of the largest existing MySQL workload at Alert Logic, which today sustains 9,000 queries per second, and peaks at over 20,000.
Basho plans to announce the inaugural Houston Riak meet-up featuring the Platform Services team at Alert Logic shortly. Stay tuned.