Tag Archives: Solr

RICON West Videos: Riak Search 2.0

December 17, 2013

In addition to Riak Data Types, there were a number of other presentations about Riak 2.0 features at RICON West. With the Technical Preview of Riak 2.0, we also announced a completely redesigned Riak Search.

Riak is a straight key/value data store and all objects are stored on disk as binaries. It is content agnostic – meaning you can store any type of data as the value in Riak. To improve the usability and functionality of Riak, we offer multiple querying options including Riak Search, Secondary Indexing, and MapReduce. Riak Search is a full-text search that allows Riak developers to index the contents of stored values. While Riak Search offers much needed functionality, it had its flaws.

In Riak 2.0, Riak Search received a complete overhaul. Riak Search 2.0 leverages the Apache Solr full-text document indexing engine directly. Riak users now get the power of Solr, with the availability and scalability of Riak. This upgrade also supports the Solr client-queries API, which enables integration with existing software solutions.

Eric Redmond is one of the Basho engineers who works on Riak Search. At RICON, he presented “Riak Search 2.0,” which walks through what’s new with Riak Search and why you’d want to use it. He also provides some impressive demos that show off the power of Solr and Riak. His full talk is below.

For more information on Riak Search 2.0, check out these resources on Github.

To watch all of the sessions from RICON West 2013, visit the Basho Technologies Youtube Channel.

Basho

Basho Announces Technical Preview of Riak 2.0 and Riak Enterprise 2.0

San Francisco, CA – October 29, 2013 – Today at RICON West, Basho, the worldwide leader in distributed systems and cloud storage software, announced that the Technical Preview of Riak 2.0, Basho’s distributed NoSQL database, is now publicly available. This major release introduces new features that improve developer ease-of-use, increase flexibility around consistency, boost search and analytics capabilities, simplify operations at scale, and provide enterprise-class data security.

Riak continues to gain adoption worldwide supporting critical applications that require high-availability, predictable scalability, and performance. Riak’s unique ability to distribute data, both to ensure availability and provide data locality, provides enterprises a proven database technology for powering critical web, mobile and social applications, cloud computing platforms, and to store and serve machine-to-machine and sensor data. Riak is used by thousands of companies, including over 30% of the Fortune 50.

New Features in Riak 2.0

  • Riak Data Types. Riak 2.0 includes a range of flexible, distributed data types, that greatly simplify application development without sacrificing Riak’s availability and partition tolerance characteristics. Available Riak data types include distributed counters, sets, maps, registers, and flags.
  • Strong Consistency. Developers now have the flexibility to choose whether buckets should be eventually consistent (the default Riak configuration today that provides high availability) or strongly consistent, based on data requirements.
  • Full-Text Search Integration with Apache Solr. Riak Search is completely redesigned in Riak 2.0, leveraging the Apache Solr engine. Riak Search in 2.0 fully supports the Solr client query APIs, enabling integration with a wide range of existing software and commercial solutions.
  • Security. Riak 2.0 adds the ability to administer access rights and utilize plug-in authentication models. Authentication and Authorization is provided via client APIs.
  • Simplified Configuration Management. Riak 2.0 continues to improve Riak’s operational simplicity by changing how, and where, configuration information is stored in an easy-to-parse and transparent format.
  • Reduced Replicas for Secondary Sites. Exclusive to Riak Enterprise 2.0, users can now optionally store fewer copies of replicated data across multiple datacenters to better maintain a balance between storage overhead and availability.

Technical Preview Availability

Download the Riak 2.0 Technical Preview here. All code for Riak 2.0 is also available on Github. For more details on the technical preview for Riak, visit our blog.

About RICON West
RICON West 2013 is part of the RICON conference series. RICON is Basho’s distributed systems conference for developers and academics. RICON West will take place in San Francisco, CA on October 29-30. More than 25 speakers will discuss applications, use cases, and the future of distributed systems – including NoSQL solutions and cloud storage. RICON West 2013 speakers include Basho, Google, Microsoft Research, Netflix, salesforce.com, Seagate, State Farm Insurance, The Weather Company, and Twitter. RICON West 2013 is sold-out; however, Basho will offer a live stream.

About Basho
Basho is a distributed systems company dedicated to making software that is highly available, fault-tolerant and easy-to-operate at scale. Basho’s distributed database, Riak, and Basho’s cloud storage software, Riak CS, are used by fast growing Web businesses and by over 25 percent of the Fortune 50 to power their critical Web, mobile and social applications and their public and private cloud platforms.

Riak and Riak CS are available open source. Riak Enterprise and Riak CS Enterprise offer enhanced multi-datacenter replication and 24×7 Basho support. For more information, visit basho.com. Basho is headquartered in Cambridge, Massachusetts and has offices in London, San Francisco, Tokyo and Washington DC.

Introducing Riak 2.0: Data Types, Strong Consistency, Full-Text Search, and Much More

October 29, 2013

Today at RICON West in San Francisco, we announced the Technical Preview of Riak 2.0 is now available. This major release adds a number of new features that many of you have been waiting for.

Throughout RICON West, we will be discussing many of the Riak 2.0 features (both in track sessions or during lightning talks), so keep your eyes on the live stream over the next two days. Videos of all sessions will also be made available after the conference.

Here is a look at some of the major enhancements available in Riak 2.0:

  • Riak Data Types. Building on the eventually consistent counters introduced in Riak 1.4, Riak 2.0 adds sets and maps as new distributed data types. These Riak Data Types simplify application development without sacrificing Riak’s availability and partition tolerance characteristics.
  • Strong Consistency. Developers have the flexibility to choose whether buckets should be eventually consistent (the default Riak configuration today that provides high availability) or strongly consistent, based on data requirements.
  • Full-Text Search Integration with Apache Solr. Riak Search is completely redesigned in Riak 2.0, leveraging the Apache Solr engine. Riak Search in 2.0 supports the Solr client query APIs, enabling integration with a wide range of existing software and commercial solutions.
  • Security. Riak 2.0 adds the ability to administer access rights and utilize plug-in authentication models. Authentication and Authorization is provided via client APIs.
  • Simplified Configuration Management. Riak 2.0 continues to improve Riak’s operational simplicity by changing how, and where, configuration information is stored in an easy-to-parse and transparent format.
  • Reduced Replicas for Multiple Data Centers. Riak Enterprise 2.0 can optionally store fewer copies of replicated data across multiple data centers to better maintain a balance between storage overhead and availability.

Ready to get started? Download the Technical Preview.

Please note that this is only a Technical Preview of Riak 2.0. This means that it has been tested extensively, as we do with all of our release candidates, but there is still work to be completed to ensure it’s production hardened. Between now and the final release, we will be continuing manual and automated testing, creating detailed use cases, gathering performance statistics, and updating the documentation for both usage and deployment.

As we are finalizing Riak 2.0, we welcome your feedback for our Technical Preview. We are always available to discuss via the Riak Users mailing list, IRC (#riak on freenode), or contact us.

Riak 2.0 Technical Preview: Deep Dive

Riak Data Types
In distributed systems, we are forced to trade consistency for availability (see: CAP Theorem) and this can complicate some aspects of application design. In Riak 2.0, we have integrated cutting-edge research on data types known as called CRDTs (Conflict-Free Replicated Data Types) pioneered by INRIA to create Riak Data Types. By adding counters, sets, maps, registers, and flags, these Riak Data Types enable developers to spend less time thinking about the complexities of vector clocks and sibling resolution and, instead, focusing on using familiar, distributed data types to support their applications’ data access patterns.

A more detailed overview of Riak Data Types is available that examines implementation considerations and the basics of usage.

Strong Consistency
In all prior versions, Riak was classified as an eventually consistent system. With the 2.0 release, Riak now lets developers choose when operations should be strongly or eventually consistent. This gives developers a choice between these semantics for different types of data. At the same time, operators can continue to enjoy the operational simplicity of Riak. Consistency preferences are defined on a per bucket type basis, in the same cluster.

A RICON West 2012 talk entitled, Bringing Consistency to Riak, shares much of the initial thinking behind this effort. In addition, the pull request that adds consistency to riak_kv provides detailed information about related repositories and the implementation approach.

Redesigned Full-Text Search
Riak is a key/value store and the values are simply stored on disk as binary. With previous versions of Riak Search, Riak developers have long been able to index the content of these stored values. In Riak 2.0, Riak Search (code-named Yokozuna) has been completely redesigned and now uses the Apache Solr full-text document indexing engine directly. Together, Riak and Solr provide a reliable full-text context indexing solution that is highly available and built for scale. In addition, Riak Search 2.0 also fully supports the Solr client query APIs, which enables integration with existing software solutions (either homegrown or commercial).

The Basho engineers responsible for Yokozuna have created a resources page that includes recorded talks, Solr documentation links, and books on the topic.

Security
Basho designed Riak with critical data in mind. Whether it’s data that affects revenue, user experience, or even a patient’s health (as is the case with the NHS), Riak ensures that this critical data is always available. However, often this critical data is also sensitive data. Riak 2.0 adds security to this data through the ability to administer access rights and plug-in various secure authentication models commonly used today.

The initial RFC that describes the security effort, including related Pull Requests, is available at github.com/basho/riak/issues/355.

Simplified Configuration Management
At Basho, we pride ourselves on providing operationally friendly software that functions smoothly when dealing with the challenges of a distributed system. In the past, configuration of Riak occurred in two files: app.config and vm.args. Riak 2.0 changes how and where configuration information is stored. It no longer uses Erlang-specific syntax but, rather, provides a layout more suited for all operators and automated deployment tools. This layout is easy to parse and transparent for Riak administrators.

More information on the vision and specific implementation considerations are contained in the repository at github.com/basho/cuttlefish.

Bucket Types
In versions of Riak prior to 2.0, keys were made up of two parts: the bucket they belong to and a unique identifier within that bucket. Buckets act as a namespace and allow for similar keys to be grouped. In addition, they provide a means of configuring how previous versions of Riak treated that data.

In Riak 2.0, several new features (security and strong consistency in particular) need to interact with groups of buckets. To this end, Riak 2.0 includes the concept of a Bucket Type. In addition to allowing new features without special prefixes in Bucket names, Riak developers and operators are able to define a group of buckets that share the same properties and only store information about each Bucket Type, rather than individual buckets.

More information about Bucket Types can be found in the Github Issue at github.com/basho/riak/issues/362. This issue describes the planned functionality, discussions about implementation, and includes related pull requests.

Change in Defaults for Sibling Resolution
Riak has always supported both application-side and timestamp and vector clock-based Last Write Wins server-side resolution. Prior to Riak 2.0, vector clock-based Last Write Wins has been the default. Moving forward, new clusters will hand off siblings to applications by default. This is the safest way to work with Riak, but requires developers to be aware of sibling resolution.

In a blog series entitled, Understanding Riak’s Configurable Behaviours, Basho Evangelist John Daily discusses the configuration of Last Write Wins, and many other options, in great detail.

More Efficient Use of Physical Memory
Riak nodes are designed to manage the changing demands of a cluster as it experiences network, hardware, and other failures. To do this, Riak balances each node’s resources accordingly. Riak 2.0 has vastly improved LevelDB’s use of available physical memory (RAM) by allowing local databases to dynamically change their cache sizes as the cluster fluctuates under load.

In the past, it was necessary to specify RAM allocation for different LevelDB caches independently. This is no longer the case. In Riak 2.0, LevelDB databases that manage key/value or active anti-entropy data share a single pool of memory, and administrators are free to allocate as much of the available RAM to LevelDB as they feel is appropriate in their deployment. Detailed implementation documentation can be found in the basho/leveldb wiki.

Riak Ruby Vagrant Project
If you are interested in testing Riak 2.0, in a contained environment with the Riak Ruby Client, Basho engineer Bryce Kerley has put together the Riak-Ruby-Vagrant repository. In addition, this environment can be easily adapted to usage with other clients for testing the new features of Riak 2.0.

Basho

Yokozuna Pre-release 0.2.0 Now Available

December 31, 2012

Happy Holidays from all of us here at Basho. We’ve got some new code to help you ring in the new year. Ryan Zezeski and others have been hard at work on Yokozuna, the next generation of Riak Search that marries Riak with Apache Solr.

The latest pre-release, 0.2.0, was just tagged, and there’s plenty to be excited about for those of you who are interested in test-driving the code. In addition to various bug fixes, some of the new features include:

  • Active Anti Entropy Support – Automatic background processing that seeks out and rectifies divergences between data stored in Riak and indexes stored in Yokozuna.
  • Benchmark Scripts – A pre-built collection of benchmarking scripts for automating performance testing.
  • Sibling Support – When enabled, Yokozuna will now index all object versions. It will also handle index cleanup upon sibling resolution.

The full release notes are up on the GitHub repo.

Contributors

Commits in this release came from Ryan Zezeski, Eric Redmond, and Dan Reverri. Mark Steele also reported a few issues that were fixed in this release.

Use Yokozuna

Remember that this is alpha software, and won’t be officially supported by Basho until a future release. That said, Ryan and the team are actively looking for beta testers with use cases that might be appropriate for Yokozuna. If you’re in the market for scalable, distributed full-text search, join the Riak Mailing List and start asking questions.

There’s a pre-built Yokozuna AWS AMI (ami-8b8d03e2) with the latest changes that’ll make it easy to take Yokozuna for a test drive.

And if you’re looking for a high-level introduction to Yokozuna, take 30 minutes to watch Ryan’s RICON2012 talk or browse the matching slide deck.

Enjoy.

The Basho Team

Basho is Taking Over Baltimore This Weekend

September 29, 2010

Basho hackers will be giving quite a few presentations between now and the end of the week. And they all happen to be in Baltimore! Here is a quick rundown (in no particular order) of where we will be, who will be there, and what we will be talking about:

Rusty Klophaus at CUFP

Rusty Klophaus will be at the Commercial Users of Functional Programming (CUFP) Event taking place this weekend in Baltimore, Maryland. His talk is called “Riak Core: Building Distributed Applications Without Shared State” and it should be downright amazing.

From the talk’s description: Both Riak KV (a key-value datastore and map/reduce platform) and Riak Search (a Solr-compatible full-text search and indexing engine) are built around a library called Riak Core that manages the mechanics of running a distributed application in a cluster without requiring a central coordinator or shared state. Using Riak Core, these applications can scale to hundreds of servers, handle enterprise-sized amounts of data, and remain operational in the face of server failure.

All the details on his talk can be found here.

And, in case you haven’t been following your Riak Core developments, check out Building Distributed Systems with Riak Core.

Dave Smith at the Ninth ACM SIGPLAN Erlang Workshop

Dave Smith (a.k.a “Dizzyd”) will be keynoting the ACM SIGPLAN Erlang Workshop taking place on Friday, Sept 30th, also in Baltimore. Dave’s talk is called “Rebar, Bitcask and how chemotherapy made me a better developer.”

Rebar and Bitcask are both pieces of software that Dizzy had a major hand in creating and they have played a huge role in Riak’s adoption (not to mention that Rebar has quickly become an indispensable tool for Erlang developers everywhere). Dave was also fighting follicular lymphoma while writing a lot of this code. Needless to say, this one is sure to be memorable and of immense value.

More details on his talk can be found here.

It should also be noted that newly-minted Basho Developer Scott Fritchie is the Workshop Chair for this event. He is an accomplished Erlang developer and Riak is not the only distributed key/value store about which Scott is passionate – he will also happily talk your ear off about Hibari.

Justin Sheehy at Surge

Just when you thought they couldn’t fit another conference in Baltimore on the same weekend… And this one is big: it’s the Surge Conference put on by the team at OmniTI. Basho will be there in the form of CTO Justin Sheehy.

Justin will be giving a talk about concurrency at scale, something about which every distributed systems developer should care deeply. Additionally, he will be taking part in a panel discussion – I haven’t seen an official name for it yet but rumor has it that it’s something along the lines of “SQL versus NoSQL.”

Check out the Surge site for more conference details.

As you can see, Baltimore is the place to be this weekend. Get there at all costs. And then go download Riak.

Mark

Webinar Recap – Riak in Action with Wriaki

August 20, 2010

Thank you to those who attended our webinar yesterday. Like before, we’re recapping the questions below for everyone’s sake (in no particular order).

Q: How would solve full text search with the current versions of Riak? One could also take Wriaki as an example as most wikis have some sort of fulltext search functionality.

I recommend using existing fulltext solutions. Solr has matched up well with most of the web applications I have written, and would certainly work for Wriaki as well.

Q: Where in the course of the interaction (shown on slide 18) are you defining the client ID? Don’t you need the client ID and vclock to match between updates?

On slide 42, we talk about “actors” which are essentially client IDs. Using the logged-in user as the client ID can help prevent vclock explosion and is a sensible way of structuring your updates.

Bryan