Tag Archives: webinar

Webinar Recap – Riak in Action with Wriaki

August 20, 2010

Thank you to those who attended our webinar yesterday. Like before, we’re recapping the questions below for everyone’s sake (in no particular order).

Q: How would solve full text search with the current versions of Riak? One could also take Wriaki as an example as most wikis have some sort of fulltext search functionality.

I recommend using existing fulltext solutions. Solr has matched up well with most of the web applications I have written, and would certainly work for Wriaki as well.

Q: Where in the course of the interaction (shown on slide 18) are you defining the client ID? Don’t you need the client ID and vclock to match between updates?

On slide 42, we talk about “actors” which are essentially client IDs. Using the logged-in user as the client ID can help prevent vclock explosion and is a sensible way of structuring your updates.

Bryan

Free Webinar – Riak in Action – Wriaki – August 19 at 2PM

August 13, 2010

Documentation is great, but playing with examples can also be a helpful way to tackle steep learning curves. To help you learn about ways of using Riak, we’d like to present “Wriaki”, an example implementation of a wiki that stores its data in Riak.

We invite you to join us for a free webinar on Thursday, August 19 at 2:00PM Eastern Time (UTC-4) about Riak in Action: Wriaki. During the presentation, Bryan Fink will cover:

  • Modeling wiki data in the Riak key/value store
  • Access patterns using both get/put and map/reduce
  • Three strategies that Wriaki uses for dealing with eventual consistency
  • how the user interface changes to accommodate Wriaki’s models

The code for Wriaki will be open-source at the time of the presentation. The presentation will last 30 to 45 minutes, with time for questions at the end. Fill in the form below to reserve your seat!  Sorry, registration has closed!

If you cannot attend, the video and slides will be made available afterward in the recap post on the blog.

Webinar Recap – Riak with Rails

August 8, 2010

Thank you to those who attended our Rails-oriented webinar yesterday. Like before, we’re recapping the questions below for everyone’s sake (in no particular order).

Q: When you have multiple application servers and Riak nodes, how do you handle “replication lag”?

Most web applications have some element of eventual consistency (or potential inconsistency) in them by their nature. Object and view caches sacrifice immediate consistency for gains in throughput and latency, and hopefully provide a better user experience. With Riak, you can achieve acceptable data freshness by “reading your writes”. That is, use the same read quorum as your write quorum and make sure that the R+W is greater than N. For example, using R=W=DW=2 when N=3 will give a strong assurance of consistency.

Q: I find myself doing def key; id; end. Is there any easier way to tell Ripple the key?

Currently there is not. However, I’ve found myself using this pattern frequently when I want a meaningful key that is also an attribute. There’s an issue on the tracker just for this feature. In the meantime, you could use two method aliases:

“`ruby
class User
include Ripple::Document
property :email, String, :presence => true

# This forces all attribute methods to be defined
define_attribute_methods
alias_method :key, :email
alias_method :key=, :email=
end
“`

As long as your property is a string, this should work just fine.

Q: Any tips on how to handle pagination over MapReduce queries?

The challenge with pagination in Riak is that reduce phases are not guaranteed to run only once, but instead are run in parallel as results from the previous phase come in asynchronously, and then followed by a final reduce. So in a sense, you have to treat all invocations of your reduce function as a “re-reduce”. We have plans to allow reduce phases to specify that they should be run only once, but for right now you can get around this limitation.

Reduce phases are always run on the coordinating node, so if you put a reduce phase before the one where you want to perform pagination, you are pretty much guaranteed that the whole result set is going to be available in a single application of the final reduce. A typical combination would be a “sorting” phase followed by a “pagination” phase. Riak.reduceSort and Riak.reduceSlice are two built-in functions that could help accomplish this task.

Sean and Grant

Free Webinar – Riak with Rails – August 5 at 2PM Eastern

July 29, 2010

Ruby on Rails is a powerful web framework that focuses on developer productivity. Riak is a friendly key value store that is simple, flexible and scalable. Put them together and you have lots of exciting possibilities!

We invite you to join us for a free webinar on Thursday, August 5 at 2:00PM Eastern Time (UTC-4) to talk about Riak with Rails. In this hands-on webinar, we’ll discuss:

  • Setting up a new Rails 3 project for Riak
  • Storing, retrieving, manipulating key-value data from Ruby
  • Issuing map-reduce queries
  • Creating rich document models with Ripple
  • Using Riak as a distributed cache and session store

The presentation will last 30 to 45 minutes, with time for questions at the end. Fill in the
form below if you want to get started building Rails applications on top of Riak!

Sorry, registration is closed.

The Basho Team

Free Webinar – MapReduce Querying in Riak – July 22 at 2PM

July 15, 2010

Map-Reduce is a flexible and powerful alternative to declarative query languages like SQL that takes advantage of Riak’s distributed architecture. However, it requires a whole new way of thinking about how to collect, process, and report your data, and is tightly coupled to how your data is stored in Riak.

We invite you to join us for a free webinar on Thursday, July 22 at 2:00PM Eastern Time (UTC-4) to talk about Map-Reduce Querying in Riak. We’ll discuss:

  • How Riak’s Map-Reduce differs from other systems and query languages
  • How to construct and submit Map-Reduce queries
  • Filtering, extracting, transforming, aggregating, and sorting data
  • Understanding the efficiency of various types of queries
  • Building and deploying reusable Map-Reduce function libraries

We’ll cover the above topics in conjunction with practical examples from sample applications. The presentation will last 30 to 45 minutes, with time for questions at the end.

Fill in the form below if you want to get started building applications with Map/Reduce on top of Riak!

Sorry, registration has closed!

The Basho Team

Webinar Recap – MapReduce Querying in Riak

July 7, 2010

Thank you to all who attended the webinar last Thursday, it was a great turnout with awesome engagement. Like before, we’re recapping the questions below for everyone’s sake (in no particular order).

Q: Say I want to perform two-fold link walking but don’t want to keep the “walk-through” results, including the initial one. Can I do something to keep only the last result?

In a MapReduce query, you can specify any number of phases to keep or ignore using the “keep” parameter on the phase. Usually you only want to keep the final phase. If you’re using the link-walker resource, it’ll return results from any phases whose specs end in “1”. See the REST API wiki page for more information on link-walking.

Q: Will Riak Search work along with MapReduce, for example, to avoid queries over entire bucket?Will there be a webinar about Riak Search?

Yes, we intend to have this feature in the Generally Available release of Riak Search. We will definitely have a webinar about Riak Search close to its public release.

Q: Are there still problems with executing “qfun” functions from Erlang during MapReduce?

“qfun” phases (that use anonymous Erlang functions) will work on a one-node cluster, but not across a multi-node cluster. You can use them in development but it’s best to switch to a compiled module function or Javascript function when moving to production.

Q: Although streams weren’t mentioned, do you have any recommendations on when to use streaming map/reduce versus normal map/reduce?

Streaming MapReduce sends results back as they get produced from the last phase, in a multipart/mixed format. To invoke this, add ?chunked=true to the URL when you submit the job. Streaming might be appropriate when you expect the result set to be very large and have constructed your application such that incomplete results are useful to it. For example, in an AJAX web application, it might make sense to send some results to the browser before the entire query is complete.

Q: Which way is faster: storing a lot of links or storing the target keys in the value as a list? Are there any limits to the maximum number of links on a key?

How the links are stored will likely not have a huge impact on performance. If you choose to store a key list in a document, both methods would work. There are two relevant operations that would be performed with the key list document (updating and traversal).

The update process would involve retrieving the list, adding a value, and saving the list. If you are using the REST interface you will need to be aware of limitations in the number of allowed headers and the allowed header length. Mochiweb restricts the number of allowed headers to 1000. Header length is limited to 8192 characters. This imposes an upper limit for the number of Links that can be set through the REST interface.

The best method for updating a key list would be to write a post commit hook that performed the update. This avoids the need to access the key list using the REST interface so header limitations are no longer a concern. However, the post-commit hook could become a bottleneck in your update path if number of links grows large.

Traversal involves retrieving the key list document, collecting the related keys, and outputting a bucket/key list to be used in proceeding map phases. A built-in function is provided to process links. If you were to store keys in the value you would need to write a custom function to parse the keys and generate a bucket/key list.

Q: What’s the benefit of passing an arg to a map or reduce phase? Couldn’t you just send the function body with the arg value filled in? Can I pass in a list of args or an arbitrary number of args?

When you have a lot of queries that are similar but with minor differences, you might be able to generalize a map or reduce function so that it can vary based on the ‘arg’ parameter. Then you could store that function in a built-ins library (see the question below) so it’s preloaded rather than evaluated at query-time. The arg parameter can be any valid JSON value.

Q: What’s the behavior if the map function is missing from one or more nodes but present on others?

The entire query will fail. It’s best to make sure, perhaps via automated deployment, that all of your functions are available on all nodes. Alternatively, you can store Javascript functions directly in Riak and use them in a phase with “bucket” and “key” instead of “source” or “name”.

Q: If there are 2 map phases, for example, then does that mean that both phases will be run back to back on each individual node and *then* it’s all sent back for reduce? Or is there some back and forth between phases?

It’s more like a pipeline, one phase feeds the next. All results from one phase are sent back to the coordinating node, which then initiates the subsequent phase once all participating nodes have replied.

Q: Would it be possible to send a function which acts as both a map predicate and an updater?

In general we don’t recommend modifying objects as part of a MapReduce job because it can add latency to the request. However, you may be able to implement this with a map function in Erlang. Erlang MapReduce functions have full access to Riak including being able to read and write data.

“`erlang
%% Inside your own Erlang module
map_predicate_with_update(Value,_KeyData,_Arg) ->
case predicate(Value) of
true -> [update_passed_value(Value)];
_ -> []
end.

update_passed_value(Value) ->
{ok, C} = riak:local_client(),
%% modify your object here, store with C:put
ModifiedValue.
“`

This could come in handy for large updates instead of having to pull each object, update it and store it.

Q: Are Erlang named functions or JS named functions more performant? Which are faster — JS or Erlang functions?

There is a slight overhead when encoding the Riak object to JSON but otherwise the performance is comparable.

Q: Is there a way to use namespacing to define named Javascript functions? In other words, if I had a bunch of app-specific functions, what’s the best way to handle that?

Yes, checkout the built-in Javascript MapReduce functions for an example.

Q: Can you specify how data is distributed among the cluster?

In short, no. Riak consistently hashes keys to determine where in the cluster data is located. This article explains how data is replicated and distributed throughout the cluster. In most production situations, your data will be evenly distributed.

Q: What is the reason for the nested list of inputs to a MapReduce query?

The nested list lets you specify multiple keys as inputs to your query, rather than a single bucket name or key. From the Erlang client, inputs are expressed as lists of tuples (fixed-length arrays) which have length of 2 (for bucket/key) or 3 (bucket/key/key-specific-data). Since JSON has no tuple type, we have to express the inputs as arrays of length 2 or 3 within an array.

Q: Is there a syntax requirement of JSON for Riak?

JSON is only required for the MapReduce query when submitted via HTTP, the objects you store can be in any format that your application will understand. JSON also happens to be a convenient format for MapReduce processing because it is accessible to both Erlang and Javascript. However, it is fairly common for Erlang-native applications to store data in Riak as serialized Erlang datatypes.

Q: Is there any significance to the name of file for how Riak finds the saved functions? I assume you can leave other languages in the same folder and it would be ignored as long as language is set to javascript? Additionally, is it possible/does it make sense to combine all your languages into a single folder?

Riak only looks for “*.js” files in the js_source_dir folder (see Configuration Files on the wiki). Erlang modules that contain map and reduce functions need to be on the code path, which could be completely separate from where the Javascript files are located.

Q: Would you point us to any best practices around matrix computations in Riak? I don’t see any references to matrix in the riak wiki…

We don’t have any specific support for matrix computations. We encourage you to find an appropriate Javascript or Erlang library to support your application.

Dan and Sean

Webinar Recap – Schema Design for Riak

July 7, 2010

Thank you to all who attended the webinar yesterday. The turnout was great, and the questions at the end were also very thoughtful. Since I didn’t get to answer very many, I’ve reviewed the questions below, in no particular order. If you want to review the slides from yesterday’s presentation, they’re on Slideshare.

Q: You say listing keys is expensive. How are Map phases affected? Does the number of keys in a bucket have an effect on the expense of the operation? (paraphrased)

Listing keys (for a single bucket, there is no analog for the entire system) requires traversing the entire keyspace, even examining keys that don’t belong to the requested bucket. If your Map/Reduce query uses a whole bucket as its inputs, it will be nearly as expensive as listing keys back to the client; however, Map phases are executed in parallel on the nodes where the data lives, so you get the full benefits of parallelism and data-locality when it executes. The expense of listing keys is taken before any Map phase begins.

It bears reiterating that the expense of listing keys is proportional to the total number of keys stored (regardless of bucket). If your bucket has only 10 keys and you know what they are, it will probably be more efficient to list them as the inputs to your Map/Reduce query than to use the whole bucket as an input.

Q: How do you recommend modeling relationships that require a large number of associations (thousands or millions)?

This is difficult to do, and I won’t say there’s an easy or best answer. One idea that came up in the IRC
room after the webinar was building a B-tree-like data-structure that could be grown to fit the number of associations. This solves the one-to-many relationship, but will require extra handling and care on the part of your application. In some cases, where you only need to know membership in the relationship, a bloom filter might be appropriate. If you must model lots of highly-connected data, consider throwing a graph database in the mix. Riak is not going to fit all use-cases, some models will be awkward.

Q: My company provides a Java web application and analytics solution that uses JDO to persist to and query from a relational database. Where would I start in integrating with Riak?

Since I haven’t done Java in a serious way for a long time, I can’t speak to the specifics of JDO, or how you might work on migrating away from it. However, I have found that most ORMs hide things from the
developer that he/she should really be aware of — how the mapping is performed, what queries are executed, etc. You’ll likely have to look into the guts of how JDO persists and retrieves objects from the database, then step back and reevaluate what your top queries are and how Riak can help improve or simplify those operations. This is all in the theme of the webinar: Know your data!

Q: Is the source code for the example application and schema design available? (paraphrased)

No, there isn’t any sample code yet. You can play with the existing application (Lowdown) at lowdownapp.com. The other authors and I are seeking a few people to take over its development, and the initial group we contacted have indicated it will be open-sourced.

Q: Is there an way to get notified on changes in a bucket?

That’s not built-in to Riak. However, you could write a post-commit hook in Erlang that pushes a notification to RabbitMQ, for example, then have the interested parties consume messages from that queue.

Q: What mechanism does Riak have to deal with the unique user issue?

Riak has neither write locks nor transactions. There is no way to absolutely guarantee uniqueness without introducing an intermediary that acts as a single-arbiter (and point-of-failure). However, in cases when you aren’t experiencing high write-concurrency on the data in question there are a few things you can do to simulate the uniqueness constraint:

  • Check for existence of the key before writing. In HTTP, this is as simple as a HEAD request. If the response is 404 Not Found, the object probably doesn’t exist.
  • Use a conditional PUT (in HTTP) when creating the object. The If-None-Match: * header should prevent you from blindly overwriting an existing key.

Neither of these solutions are bullet-proof because all operations happen in Riak asynchronously. Remember that it’s eventually consistent, meaning that not all parts of the system may agree at all times, but they will converge on a single state over time. There will be corner-cases where a key doesn’t exist when you check for it, the write via the conditional request succeeds, and you still end up creating an object in conflict. Caveat emptor.

Q: Are the intermediate results of Link and Map phases cached?

Yes, the results of both map and link phases are cached in a pretty naive LRU. The development team has plans to improve its behavior in future versions of Riak.

Q: Could you comment on commit hooks and what place they have, if any, in riak schema design? Would it make sense to use hooks to build an index e.g. keys in a bucket?

Yes, commit hooks are very useful in schema design. For example, you could use a pre-commit hook to validate the format of data before it’s stored. You could use post-commit hooks to send the data to external services (see above) or, as you suggest, build an index in another bucket. Building a secondary index reliably is complicated though, and it’s something I want to work on over the next few months.

Q: So if you have allow_mult=false are there cases where riak will return a conflict 409? Is the default that last write wins?

Riak never returns a 409 Conflict status from the HTTP interface on writes. If you supply a conditional header (If-Match, for example) you might get a 412 Precondition Failed response if the ETag of the object to be modified doesn’t match the header. In general, it is Riak’s policy to accept writes regardless of the internal state of the object.

The “last write wins” behavior comes in two flavors: “clobbering” writes, and softer “show me the latest one” reads. The latter is the default behavior, in which siblings might occur internally (and the vector clock grown) but not exposed to the client; instead it returns the sibling with the latest timestamp at read/GET time and “throws away” new writes that are based on older (ancestor) vclocks. The former actually ignores vector clocks for the specified bucket, providing no guarantees of causal ordering of writes. To turn this behavior on, set the last_write_wins bucket property to true. Except in the most extreme cases where you don’t mind clobbering things that were written since the last time you read, we recommend using the default behavior. If you set allow_mult=true, conflicting writes (objects with divergent vector clocks, not traceable descendents) will be exposed to the client with a 300 Multiple response.

Again, thanks for attending! Look for our next webinar in about two weeks.

Sean

Free Webinar – Schema Design for Riak – July 8th at 2PM Eastern

June 30, 2010

Moving applications to Riak involves a number of changes from the status quo of RDBMS systems, one of which is taking greater control over your schema design. You’ll have questions like: How do you structure data when you don’t have tables and foreign keys? When should you denormalize, add links, or create map-reduce queries? Where will Riak be a natural fit and where will it be challenging?

We invite you to join us for a free webinar on Thursday, July 8 at 2:00PM Eastern Time to talk about Schema Design for Riak. We’ll discuss:

  • Freeing yourself of the architectural constraints of the “relational” mindset
  • Gaining a fuller understanding of your existing schema and its queries
  • Strategies and patterns for structuring your data in Riak
  • Tradeoffs of various solutions

We’ll address the above topics and more as we design a new Riak-powered schema for a web application currently powered by MySQL. The presentation will last 30 to 45 minutes, with time for questions at the end.

Fill in the form below if you want to get started building applications on top of Riak!

Sorry, registration is closed.

The Basho Team

Free Webinar – Load-Testing Riak – June 3rd @ 2PM Eastern

May 26, 2010

We frequently receive questions about how Riak will behave in specific production scenarios, and how to tune it for those workloads. The answer is, generally, that it depends — the only way to know for sure is to measure!

We invite you to join us for a free webinar on Thursday, June 3 at 2:00PM Eastern Time to discuss how to measure Riak performance and develop your own action testing plan. We’ll discuss:

  • Understanding what to test, and how to test it
  • Classifying your application and its load profile
  • Configuring and using our load generation and measurement tool
  • Interpreting the results of your test and taking next steps

As part of the webinar, you will get access to and a demonstration of basho_bench, our benchmarking and load-testing tool. The presentation will last 30 to 45 minutes.

Registration will be limited to 20 parties, on a first-come, first-serve basis. Fill in the form below if you’re serious about getting a scalable, predictable, cost-effective storage solution in Riak!

UPDATE: Due to overwhelming demand for this Webinar, we are expanding the number of seats available. So don’t shy away from registering if you think it’s full. We will have plenty of room for you!

The Basho Team

Webinar Registration Form

Registration has closed. We’ll hold another one soon!