February 18, 2011
Firstly, a big thanks goes out to everyone who attended yesterday’s MapReducing Big Data With Luwak Webinar. As promised, here is the screencast (below) from the webinar. It should be quite useful for those of you who weren’t able to attend or who would like to view the content again (it’s good enough to warrant multiple views).
If you prefer slides, there is a PDF version of the presentation available here.
If you have any questions or comments (webinar-related or otherwise), leave them below and we’ll get back to you.
February 14, 2011
Basho Senior Engineer Bryan Fink has been doing some exceptional work with MapReduce and Luwak, Riak’s large-object storage interface. Recently, he wrote up two extensive blog posts on the specifics of Luwak and the powerful tool it makes when combined with Riak’s MapReduce engine:
We’ve seen a huge amount of Luwak usage since its release and, since these blog posts, a large amount of interest in running MapReduce queries over data stored in Riak via Luwak. So, we thought what better way to spread the word than through a free Webinar?
This Thursday, February 17th at 2PM EST, Bryan will be leading the MapReducing Big Data With Luwak Webinar. The planned agenda is as follows:
- Overview of Riak MapReduce and its typical usage
- Gotchas and troubleshooting
- Usage Recommendations and Best Practices
- An Introduction to Luwak, Riak’s Large File Storage Interface
- Luwak MapReduce in Action
Registration is now closed.
Hope to see you there.
**January 10, 2013**
Join us on Thursday, January 17 at 11 am PT / 2 pm ET for an intro to Riak webcast with Shanley Kane, director of product management and Mark Phillips, director of community management.
In 30 minutes, we’ll cover:
* Good and bad use cases for Riak
* User stores of note, including social, content, advertising and session storage
* Riak’s architecture and mechanisms for remaining highly available in failure conditions
* APIs, data model and client libraries
* Features for querying and searching data
* What’s new in the latest version of Riak and what’s next
[Sign up for the webcast here](http://info.basho.com/IntroToRiakJan17.html).
October 1, 2012
New to Riak? Join us this Thursday for an intro to Riak webcast with Mark Phillips, Basho director of community, and Shanley Kane, director of product management. In this 30 minute talk we’ll cover the basics of:
Good and bad use cases for Riak
- Some user stories of note
- Riak’s architecture: consistent hashing, hinted handoff, replication, gossip protocol and more
- APIs and client libraries
- Features for searching and aggregating data: Riak Search, Secondary Indexes and Map Reduce
- What’s new in the latest version of Riak
Register for the webcast here.
**October 22, 2012**
If you missed our last ‘Intro to Riak’ webcast, not to fear, we’re doing another. This Thursday (11am PT / 2pm ET) , join Shanley, Basho’s director of product marketing, and Mark, director of coumminity, for a 30 minute webcast introducing Riak’s architecture, use cases, user stories, operations and data model.
Register for the webcast [here](http://info.basho.com/IntroToRiakOct25.html).
**November 05, 2012**
Earlier this year we launched Riak CS – simple, available cloud storage built on Riak. We gave it an S3-compatible API, made it multi-tenant, and added per-user reporting on network and storage utilization. Riak CS provides the core features to build public or private clouds that are distributed, fault-tolerant and easy to scale.
New to Riak CS? Join us this Wednesday (11am PT / 2pm ET) for an Intro to Riak CS webcast with Basho chief architect Andy Gross and director of product management Shanley Kane. In this 30 minute webcast, we’ll cover:
* Main features, including S3-compatibility, multi-tenancy, large object support and reporting
* Operations and interfaces
* Use cases in public/private clouds and applications
* Latest release and roadmap plans
Register for the webcast [here](http://info.basho.com/IntroToRiakCSNov7.html).
**January 02, 2013**
New to Riak? Thinking about using Riak instead of a relational database? Join Basho chief architect Andy Gross and director of product management Shanley Kane for an intro this Thursday (11am PT/2pm ET). In about 30 minutes, we’ll cover the basics of:
* Scalability benefits of Riak, including an examination of limitations around master/slave architectures and sharding, and what Riak does differently
* A look at the operational aspects of Riak and where they differ from relational approaches
* Riak’s data model and benefits for developers, as well as the tradeoffs and limitations of a key/value approach
* Migration considerations, including where to start when migrating existing apps
* Riak’s eventually consistent design
* Multi-site replication options in Riak
Register for the webcast [here](http://info.basho.com/RelationalToRiakJan3.html).
December 8, 2010
Thank you to all who attended the webinar yesterday. The turnout was great, and the questions at the end were also very thoughtful. Since I didn’t get to answer very many, I’ve reviewed all of the questions below, in no particular order.
Q: Can you touch on upcoming filtering of keys prior to map reduce? Will it essentially replace the need for one to explicitly name the bucket/key in a M/R job? Does it require a bucket list-keys operation?
Key filters, in the upcoming 0.14 release, will allow you to logically select a population of keys from a bucket before running them through MapReduce. This will be faster than a full-bucket map since it only loads the objects you’re really interested in (the ones that pass the filter). It’s a great way to make use of meaningful keys that have structure to them. So yes, it does require an list-keys operation, but doesn’t replace the need to be explicit about which keys to select; there are still many useful queries that can be done when the keys are known ahead of time.
For more information on key-filters, see Kevin’s presentation on the upcoming MapReduce enhancements.
Q: How can you validate that you’ve reached a good/valid KV model when migrating a relational model?
The best way is to try out some models. The thing about schema design for Riak that turns your process on its head is that you design for optimizing queries, not for optimizing the data model. If your queries are efficient (single-key lookup as much as possible), you’ve probably reached a good model, but also weigh things like payload size, cost of updating, and difficulty manipulating the data in your application. If your design makes it substantially harder to build your application than a relational design, Riak may not be the right fit.
Q: Are there any “gotchas” when thinking of a bucket as we are used to thinking of a table?
Like tables, buckets can be used to group similar data together. However, buckets don’t automatically enforce data structure (columns with specified types, referential integrity) like relational tables do; that part is still up to your application. You can, however, add precommit hooks to buckets to perform any data validation that your application shouldn’t handle.
Q: How would you create a ‘manual index’ in Riak? Doesn’t that need to always find unique keys?
One basic way to structure a manually-created index in Riak is to have a bucket specifically for the index. Keys in this bucket correspond to the exact value you are indexing (for fuzzy or incomplete values,
use Riak Search). The objects stored at those keys have links or lists of keys that refer to the original object(s). Then you can find the original simply by following the link or using MapReduce to extract and find the related keys.
The example I gave in the webinar Q&A was indexing users by email. To create the index, I would use a bucket named
users_by_email. If I wanted to lookup my own user object by email, I’d try to fetch the object
firstname.lastname@example.org, then follow the link in it (something like
riaktag="indexed") to find the actual data.
Whether those index values need to be unique is up to your application to design and enforce. For example, the index could be storing links to blog posts that have specific tags, in which case the index need not be unique.
To create the index, you’ll either have to perform multiple writes from your application (one for the data, one for the index), or add a commit hook to create and modify it for you.
Q: Can you compare/contrast buckets w/ Cassandra column families?
Cassandra has a very different data model from Riak, and you’ll want to consult with their experts to get a second opinion, but here’s what I know. Column families are a way to group related columns together that you will always want to retrieve together, and is something that you design up-front (it requires restarting the cluster for changes to take effect). It’s the closest thing to a relational table that Cassandra has.
Although you do use buckets to group similar data items, in contrast, Riak’s buckets:
- Don’t understand or enforce any internal structure of the values,
- Don’t need to be created or designed ahead of time, but pop into existence when you first use them, and
- Don’t require a restart to be used.
Q: How would part sharing be achieved? (this is a reference to the example given in the webinar, Radiant CMS)
Radiant shares content parts only when specified by the template language, and always by inheritance from ancestor pages. So if the layout contained
<r:content part="sidebar" inherit="true"
/>, then if the currently rendering page doesn’t have that content part, it will look up the hierarchy until it finds it. This is one example of why it’s so important to have an efficient way to traverse the site hierarchy, and why I presented so many options.
Q: What is the max number of links an object can have for Link Walking?
There’s no cut-and-dry answer for this. Theoretically, you are limited only by storage space (disk and RAM) and the ability to retrieve the object from the desired interface. In a practical sense this means that the default HTTP interface limits you to around 100,000 links on a single object (based on previous discussions of the limits of HTTP packets and header lengths). Still, this is not going to be reasonable to deal with in your application. In some applications we’ve seen links on the order of hundreds per object negatively impact link-walking performance. If you need to have that many, you’ll be better off exploring other designs.
Again, thanks for attending! Look for our next webinar coming in about month.
— Sean, Developer Advocate
December 1, 2010
Moving applications to Riak involves a number of changes from the status quo of RDBMS systems, one of which is taking greater control over your schema design. You’ll have questions like: How do you structure data when you don’t have tables and foreign keys? When should you denormalize, add links, or create MapReduce queries? Where will Riak be a natural fit and where will it be challenging?
We invite you to join us for a free webinar on Tuesday, December 7 at 2:00PM Eastern Time to talk about Schema Design for Riak. We’ll discuss:
- Freeing yourself of the architectural constraints of the “relational” mindset
- Gaining a fuller understanding of your existing schema and its queries
- Strategies and patterns for structuring your data in Riak
- Tradeoffs of various solutions
We’ll address the above topics and more as we design a new Riak-powered schema for a web application currently powered by MySQL. The presentation will last 30 to 45 minutes, with time for questions at the end.
If you missed the previous version of this webinar in July, here’s your chance to see it! We’ll also use a different example this time, so even if you attended last time, you’ll probably learn something new.
Fill in the form below if you want to get started building applications on top of Riak!
Sorry, registration is closed! Video of the presentation will be posted on Vimeo after the webinar has ended.
September 16, 2010
Justin Sheehy, Basho’s CTO, and Bryan Cantrill, Joyent’s VP of Engineering, are two acknowledged authorities on distributed systems and cloud computing. Riak SmartMachines offer you all the advantages of Riak running on the best cloud technology there is.
What do you get when you put Justin Sheehy, Bryan Cantrill and a microphone in a room for 45 minutes to talk about NoSQL and Riak SmartMachine performance? One of the best webinars we’ve ever had the chance to take part in!
Watch this one at least three times. Seriously. And then go download Riak.
You can download a PDF version of the slides used in the webinar here.