Tag Archives: Riak

Webcast- Intro to Riak, Thursday January 17

**January 10, 2013**

Join us on Thursday, January 17 at 11 am PT / 2 pm ET for an intro to Riak webcast with Shanley Kane, director of product management and Mark Phillips, director of community management.

In 30 minutes, we’ll cover:

* Good and bad use cases for Riak
* User stores of note, including social, content, advertising and session storage
* Riak’s architecture and mechanisms for remaining highly available in failure conditions
* APIs, data model and client libraries
* Features for querying and searching data
* What’s new in the latest version of Riak and what’s next

[Sign up for the webcast here](http://info.basho.com/IntroToRiakJan17.html).


From Relational to Riak (Webcast)

**January 02, 2013**

New to Riak? Thinking about using Riak instead of a relational database? Join Basho chief architect Andy Gross and director of product management Shanley Kane for an intro this Thursday (11am PT/2pm ET). In about 30 minutes, we’ll cover the basics of:

* Scalability benefits of Riak, including an examination of limitations around master/slave architectures and sharding, and what Riak does differently
* A look at the operational aspects of Riak and where they differ from relational approaches
* Riak’s data model and benefits for developers, as well as the tradeoffs and limitations of a key/value approach
* Migration considerations, including where to start when migrating existing apps
* Riak’s eventually consistent design
* Multi-site replication options in Riak

Register for the webcast [here](http://info.basho.com/RelationalToRiakJan3.html).



Intro to Riak Webcast

October 1, 2012

New to Riak? Join us this Thursday for an intro to Riak webcast with Mark Phillips, Basho director of community, and Shanley Kane, director of product management. In this 30 minute talk we’ll cover the basics of:

Good and bad use cases for Riak

  • Some user stories of note
  • Riak’s architecture: consistent hashing, hinted handoff, replication, gossip protocol and more
  • APIs and client libraries
  • Features for searching and aggregating data: Riak Search, Secondary Indexes and Map Reduce
  • What’s new in the latest version of Riak

Register for the webcast here.



Riak at PyConZA

**October 03, 2012**

[PyConZA](http://za.pycon.org/) starts this Thursday in Cape Town, South Africa, and Basho is proud to be [a sponsor](http://za.pycon.org/sponsors_basho.html). As our blurb on the PyConZA site states:

Basho is proud to sponsor PyConZA because we believe in the power and future of the South African tech community. We look forward to the valuable, lasting technologies that will be coming out of Cape Town and the surrounding areas in the coming years.

You may remember that back in August we [put out the call](http://basho.com/blog/technical/2012/08/23/Riak-ambassadors-needed-for-PyCon-ZA/) for Riak Community Ambassadors to help us out with PyconZA. As hard as it is to miss out on a chance to go to The Mother City, it wasn’t feasible for us to make it with [RICON](http://ricon2012.com) happening next week. I’m happy to report that a few fans of durability have stepped forward as Ambassadors to make sure Riak is fully-represented. If you’re lucky enough to be going to PyconZA (it’s sold out), be on the lookout for the following two characters:

### Joshua Maserow

### Mike Jones

In addition to taking in the talks, Mike and Joshua will be on hand to answer your Riak questions (or tell you where to look for the answers if they don’t have them). There will also be some production Riak users in the crowd, so you won’t have to look too far if you want to get up to speed on why you should be running Riak.

Enjoy PyconZA. And thanks again to Mike and Joshua for volunteering to represent Riak in our absence.


Webinar Recap and Q&A – Schema Design for Riak

December 8, 2010

Thank you to all who attended the webinar yesterday. The turnout was great, and the questions at the end were also very thoughtful. Since I didn’t get to answer very many, I’ve reviewed all of the questions below, in no particular order.

Q: Can you touch on upcoming filtering of keys prior to map reduce? Will it essentially replace the need for one to explicitly name the bucket/key in a M/R job? Does it require a bucket list-keys operation?

Key filters, in the upcoming 0.14 release, will allow you to logically select a population of keys from a bucket before running them through MapReduce. This will be faster than a full-bucket map since it only loads the objects you’re really interested in (the ones that pass the filter). It’s a great way to make use of meaningful keys that have structure to them. So yes, it does require an list-keys operation, but doesn’t replace the need to be explicit about which keys to select; there are still many useful queries that can be done when the keys are known ahead of time.

For more information on key-filters, see Kevin’s presentation on the upcoming MapReduce enhancements.

Q: How can you validate that you’ve reached a good/valid KV model when migrating a relational model?

The best way is to try out some models. The thing about schema design for Riak that turns your process on its head is that you design for optimizing queries, not for optimizing the data model. If your queries are efficient (single-key lookup as much as possible), you’ve probably reached a good model, but also weigh things like payload size, cost of updating, and difficulty manipulating the data in your application. If your design makes it substantially harder to build your application than a relational design, Riak may not be the right fit.

Q: Are there any “gotchas” when thinking of a bucket as we are used to thinking of a table?

Like tables, buckets can be used to group similar data together. However, buckets don’t automatically enforce data structure (columns with specified types, referential integrity) like relational tables do; that part is still up to your application. You can, however, add precommit hooks to buckets to perform any data validation that your application shouldn’t handle.

Q: How would you create a ‘manual index’ in Riak? Doesn’t that need to always find unique keys?

One basic way to structure a manually-created index in Riak is to have a bucket specifically for the index. Keys in this bucket correspond to the exact value you are indexing (for fuzzy or incomplete values,
use Riak Search). The objects stored at those keys have links or lists of keys that refer to the original object(s). Then you can find the original simply by following the link or using MapReduce to extract and find the related keys.

The example I gave in the webinar Q&A was indexing users by email. To create the index, I would use a bucket named users_by_email. If I wanted to lookup my own user object by email, I’d try to fetch the object
at users_by_email/sean@basho.com, then follow the link in it (something like </riak/users/237438-28374384-128>; riaktag="indexed") to find the actual data.

Whether those index values need to be unique is up to your application to design and enforce. For example, the index could be storing links to blog posts that have specific tags, in which case the index need not be unique.

To create the index, you’ll either have to perform multiple writes from your application (one for the data, one for the index), or add a commit hook to create and modify it for you.

Q: Can you compare/contrast buckets w/ Cassandra column families?

Cassandra has a very different data model from Riak, and you’ll want to consult with their experts to get a second opinion, but here’s what I know. Column families are a way to group related columns together that you will always want to retrieve together, and is something that you design up-front (it requires restarting the cluster for changes to take effect). It’s the closest thing to a relational table that Cassandra has.

Although you do use buckets to group similar data items, in contrast, Riak’s buckets:

  1. Don’t understand or enforce any internal structure of the values,
  2. Don’t need to be created or designed ahead of time, but pop into existence when you first use them, and
  3. Don’t require a restart to be used.

Q: How would part sharing be achieved? (this is a reference to the example given in the webinar, Radiant CMS)

Radiant shares content parts only when specified by the template language, and always by inheritance from ancestor pages. So if the layout contained <r:content part="sidebar" inherit="true" />, then if the currently rendering page doesn’t have that content part, it will look up the hierarchy until it finds it. This is one example of why it’s so important to have an efficient way to traverse the site hierarchy, and why I presented so many options.

Q: What is the max number of links an object can have for Link Walking?

There’s no cut-and-dry answer for this. Theoretically, you are limited only by storage space (disk and RAM) and the ability to retrieve the object from the desired interface. In a practical sense this means that the default HTTP interface limits you to around 100,000 links on a single object (based on previous discussions of the limits of HTTP packets and header lengths). Still, this is not going to be reasonable to deal with in your application. In some applications we’ve seen links on the order of hundreds per object negatively impact link-walking performance. If you need to have that many, you’ll be better off exploring other designs.

Again, thanks for attending! Look for our next webinar coming in about month.

Sean, Developer Advocate

Introducing Riak Function Contrib

December 2, 2010

A short while ago I made it known on the Riak Mailing list that the Basho Dev team was working on getting “more resources out the door to help the community better use Riak.” Today we are pleased to announce that Riak Function Contrib, one of these resources, is now live and awaiting your usage and code contributions.

What is Riak Function Contrib?

Riak Function Contrib is a home for MapReduce, Pre- and Post-Commit, and “Other” Functions or pieces of code that developers are using or might have a need for in their applications or testing. Put another way, it’s a Riak-specific function repository and library. Riak developers are using a lot of functions in a ton of different ways. So, we built Function Contrib to promote efficient development of Riak apps and to encourage a deeper level of community interaction through the use of this code.

How do I use it?

There are two primary ways to make use of Riak Function Contrib:

  1. Find a function – If, for instance, you needed a Pre-Commit Hook to validate a JSON document before you store it in Riak, you could use or adapt this to your needs. No need to write it yourself!
  2. Contribute a Function – There are a lot of use cases for Riak. This leads to many different functions and code bits that are written to extend Riak’s functionality. It you have one (or 20) functions that you think might be of use to someone other than you, contribute it. You’ll be helping developers everywhere. Francisco Treacy and Widescript, for example, did their part when they contributed this JavaScript reduce function for Sorting by Fields.

What’s Next?

Riak Function Contrib is far from being a complete library. That’s where you come in. If you have a function, script, or some other piece of code that you think may be beneficial to someone using Riak (or Riak Search for that matter), we want it. Head over to the Riak Function Contrib Repo on GitHub and check out the README for details.

In the meantime, the Basho Dev team will continue polishing up the site and GitHub repo to make it easier to use and contribute to.

We are excited about this. We hope you are, too. As usual, thanks for being a part of Riak.

More to come…


Community Manager

Free Webinar – Schema Design for Riak – Dec 7 at 2PM Eastern

December 1, 2010

Moving applications to Riak involves a number of changes from the status quo of RDBMS systems, one of which is taking greater control over your schema design. You’ll have questions like: How do you structure data when you don’t have tables and foreign keys? When should you denormalize, add links, or create MapReduce queries? Where will Riak be a natural fit and where will it be challenging?

We invite you to join us for a free webinar on Tuesday, December 7 at 2:00PM Eastern Time to talk about Schema Design for Riak. We’ll discuss:

  • Freeing yourself of the architectural constraints of the “relational” mindset
  • Gaining a fuller understanding of your existing schema and its queries
  • Strategies and patterns for structuring your data in Riak
  • Tradeoffs of various solutions

We’ll address the above topics and more as we design a new Riak-powered schema for a web application currently powered by MySQL. The presentation will last 30 to 45 minutes, with time for questions at the end.

If you missed the previous version of this webinar in July, here’s your chance to see it! We’ll also use a different example this time, so even if you attended last time, you’ll probably learn something new.

Fill in the form below if you want to get started building applications on top of Riak!

Sorry, registration is closed! Video of the presentation will be posted on Vimeo after the webinar has ended.

The Basho Team

Two new Erlang/OTP applications added to Riak products – 'riak_err' and 'cluster_info'

November 17, 2010

The next release of Riak will include two new Erlang/OTP applications: riak_err and cluster_info. The riak_err application will improve Riak’s runtime robustness by strictly limiting the amount of RAM that is used while processing event log messages. The cluster_info application will assist troubleshooting by automatically gathering lots of environment, configuration, and runtime statistics data into a single file.

Wait a second, what are OTP applications?

The Erlang virtual machine provides most of the services that an operating system like Linux or Windows provides: memory management, file system management, TCP/IP services, event management, and the ability to run multiple applications. Most modern operating systems allow you to run a Web browser, word processor, spreadsheet, instant messaging app, and many others. And if your email GUI app crashes, your other applications continue running without interference.

Likewise, the Erlang virtual machine supports running multiple applications. Here’s the list of applications that are running within a single Riak node — we ask the Erlang CLI to list them for us.

(riak@> application:which_applications().
[{cluster_info,"Cluster info/postmortem app","0.01"},
{skerl,"Skein hash function NIF","0.1"},
{riak_kv,"Riak Key/Value Store","0.13.0"},
{riak_core,"Riak Core","0.13.0"},
{luke,"Map/Reduce Framework","0.2.2"},
{mochiweb,"MochiMedia Web Server","1.7.1"},
{erlang_js,"Interface between BEAM and JS","0.4.1"},
{runtime_tools,"RUNTIME_TOOLS version 1","1.8.3"},
{crypto,"CRYPTO version 1","1.6.4"},
{os_mon,"CPO CXC 138 46","2.2.5"},
{riak_err,"Custom error handler","0.1.0"},
{sasl,"SASL CXC 138 11","2.1.9"},
{stdlib,"ERTS CXC 138 10","1.16.5"},
{kernel,"ERTS CXC 138 10","2.13.5"}]

Yes, that’s 17 different applications running inside a single node. For each item in the list, we’re told the application’s name, a human-readable name, and that application’s version number. Some of the names like ERTS CXC 138 10 are names assigned by Ericsson.

Each application is, in turn, a collection of one or more processes that provide some kind of computation service. Most of these processes are arranged in a “supervisor tree”, which makes the task of managing faults (e.g., if a worker process crashes, what do you do?) extremely easy. Here is the process tree for the kernel application.

And here is the process tree for the riak_kv application.

The riak_err application

See the GitHub README for riak_err for more details.

The Erlang/OTP runtime provides a useful mechanism for managing all of the info, error, and warning events that an application might generate. However, the default handler uses some not-so-smart methods for making human-friendly message strings.

The big problem is that the representation used internally by the virtual machine is a linked list, one list element per character, to store the string. On a 64-bit machine, that’s 16 bytes of RAM per character. Furthermore, if the message contains non-printable data (i.e., not ASCII or Latin-1 characters), the data will be formatted into numeric representation. The string “Bummer” would be formatted just like that, Bummer. But if each character in that string had the constant 140 added to it, the 6-byte string would be formatted as the 23-byte string 206,257,249,249,241,254 instead (an increase of about 4x). And, in rare but annoying cases, there’s some additional expansion on top of all of that.

The default error handler can take a one megabyte error message and use over 1 megabyte * 16 * 4 = 32 megabytes of RAM. Why should error messages be so large? Depending on the nature of a user’s input (e.g. a large block of data from a Web client), the process’s internal state, and other factors, error messages can be much, much bigger than 1MB. And it’s really not helpful to consume half a gigabyte of RAM (or more) just to format one such message. When a system is under very heavy load and tries to format dozens of such messages, the entire virtual machine can run out of RAM and crash.

The riak_err OTP application replaces about 90% of the default Erlang/OTP info/error/warning event handling mechanism. The replacement handler places strict limits on the maximum size of a formatted message. So, if you want to limit the maximum length of an error message to 64 kilobytes, you can. The result is that it’s now much more difficult to get Riak to crash due to error message handling. It makes us happy, and we believe you’ll be happier, too.

Licensing for the riak_err application

The riak_err application was written by Basho Technologies, Inc. and is licensed under the Apache Public License version 2.0. We’re very interested in bug reports and fixes: our mailbox is open 24 hours per day for GitHub comments and pull requests.

The cluster_info application

The cluster_info application is included in the packaging for the Hibari key-value store, which is also written in Erlang. It provides a flexible and easily-extendible way to dump the state of a cluster of Erlang nodes.

Some of the information that the application gathers includes:

  • Date & time
  • Statistics on all Erlang processes on the node
  • Network connection details to all other Erlang nodes
  • Top CPU- and memory-hogging processes
  • Processes with large mailboxes
  • Internal memory allocator statistics
  • ETS table information
  • The names & versions of each code module loaded into the node

The app can also automatically gather all of this data from all nodes and write it into a single file. It’s about as easy as can be to take a snapshot of all nodes in a cluster. It will be a valuable tool for Basho’s support and development teams to diagnose problems in a cluster, as a tool to aid capacity planning, and merely to answer a curious question like, “What’s really going on in there?”

Over time, Basho will be adding more Riak-specific info-gathering functions. If you’ve got feature suggestions, (e.g., dump stats on how many times you woke up last night, click here to send all this data to Basho’s support team via HTTP or SMTP), we’d like to hear them. Or, if you’re writing the next game-changing app in Erlang, just go ahead and hack the code to fit your needs.

Licensing for the cluster_info application

The cluster_info application was written by Gemini Mobile Technologies, Inc. and is licensed under the Apache Public License version 2.0.


Free Software Can Not Be Taken Away

November 15, 2010

Oracle didn’t (and can’t) take away your open source software.

A few weeks ago Oracle caused a lot of confusion when they changed the makeup of the MySQL product line, including a “MySQL Classic Edition” version that does not cost money and does not include InnoDB. That combination in the product chart made many people wonder if InnoDB itself had ceased to be free in either the “free beer” or “free speech” sense. The people wondering and worrying included a few users of Innostore, the InnoDB-based storage engine that can be used with Riak.

Luckily, open source software doesn’t work that way.

Oracle didn’t really even try to do what some people thought; they just released a confusing product graph which they have since updated. The MySQL that most people think of first is MySQL Community Edition and it was not one of the editions mentioned in the chart that confused people. That version of MySQL, as well as all of the GPL components included in it such as InnoDB, remain free of cost and also available under the GPL.

This confusion eventually led to a public response from Oracle, so you can read it authoritatively if you like.

Even if someone wanted to, they couldn’t “take it back” in the way that some people feared. Existing software that has been legitimately available under an open source license such as GPL or Apache cannot retroactively be made unfree. The copyright owner might choose to not license future improvements as open source, but that which is already released in such a way cannot be undone. Oracle and Innobase aren’t currently putting new effort into Embedded InnoDB, but a new project has spun up to move it forward. If the HailDB project produces improvements of value, then future versions of Innostore may switch to using that engine instead of using the original Embedded InnoDB release.

InnoDB is available under the GPL. Innostore, as a derivative work of Embedded InnoDB, is also available under the GPL. Neither Oracle nor Basho can take that away from you.


Soap people have to grow up quick: Two Weeks in the Life of a NoSQL Company

November 11, 2010

Things are moving incredibly fast in the NoSQL space. I am used to internet-fast — helping bring on 300 customers in a year at Akamai; going from adult bulletin boards and leased lines to hosting sites for twenty percent of the Fortune 500 at Digex (Verizon Business) in eighteen months. I have never seen a space explode like the NoSQL space.

Two weeks ago, Justin Sheehy stood on stage delivering a rousing and thoughtful presentation to the NoSQL East Conference that was less about Riak and more about a definition of first principles that underpinned Riak: what it REALLY means when you claim such terms as scalability (it doesn’t mean buying a bigger machine for your master DB) and fault-tolerance (it has to apply to writes and reads and is binary; you either always accept writes and serve reads or you don’t). The conference was a bit of a coming out party for Basho, which co-sponsored the event with Rackspace, Georgia Tech, and a host of other companies. We had been working on Riak for 18 months or so in relative quiet and it was nice to finally see what people thought, first hand.

There were equally interesting presentations about Pig and MongoDB and a host of other NoSQL entrants, all of which will make for engrossing viewing when they finally get posted. We were told this wasn’t quite as exciting as the NoSQL conference out West but none of us seemed to mind. Home Depot, Turner Broadcasting, Weather.com, and Comcast had all sent folks down to evaluate the technology for real, live problems and the enthusiasm in the auditorium spilled out into the Atlanta bars. Business cards were exchanged, calls set up, even a little business discussed. Clearly, NoSQL databases were maturing fast.

No sooner had we returned to Cambridge than news of Flybridge’s investment in 10Gen came out. Hooray! Someone was willing to bet a $3.4 million dollars on a company in the space. Chip Hazard, ever affable, wrote a nice blog post explaining the investment. According to him, every developer they talked to had downloaded some NoSQL database to test. Brilliant news. He said Flybridge invested in 10Gen because they liked the space and knew the team from their investment in Doubleclick, from whose loins the management team at 10Gen issued. No more felicitous reason exists for a group of persons to invest $3.4 million than that previous investments in the same team were handsomely rewarded. I would wish Chip and 10Gen the best if I had time.

Because contemporaneous with the news of Flybridge’s investment, and almost as if the world had decided NoSQL’s time had come, we began to field emails and calls from interested parties. Trials, quotes, lengthy discussions about features and uses of Riak — the week was a blur. Everyone was conducting a bakeoff: “I have a 4TB database and customers in three continents. I am evaluating Riak and two other document datastores. Tell me about your OLAP features.”

Heady times and, frankly, of somewhat dubious promise, if you ask me. Potential clients that materialize so quickly always seem to disappear just as fast. Really embracing a new technology requires trials, tests, new features, and time. Time most off all. These “bluebirds” would fly away in no time, if my experience held true.

Except, this time it didn’t happen. Contracts were exchanged. Pen nibs were sharpened. It is as if the entire world decided to not wait for the everyone else to jump on the bandwagon and instead, decided to go NoSQL. Even using this last week as the sole example, I think the reason is plain — people have real pain and suddenly the word is out that they no longer have to suffer.

Devs are constrained by what they can build, rich features notwithstanding. Ask the company that had to choose between Riak and a $100K in-memory appliance to scale. And Ops is getting slaughtered — the cost of scaling poorly (and by poorly I mean pagers going off during dinner, bulk updates taking hours and failing all the time, fragmented and unmanageable indices consuming dozens of machines) is beginning to look like the cost of antiquated technology. Good Ops people are not fools. They look for ways to make life easier. Make no mistake — all the Devs and Ops folks came with a set of tough questions and a list of new features. They also came with an understanding that companies that release open source software still have a business to run. They are willing to spend on a real company. In fact, having a business behind Riak ended up mattering as much as any features.

So, I suspect, we are at the proverbial “end of the beginning.” Smart people in the NoSQL movement have succeeded in building convincingly good software and then explaining the virtues convincingly (all but one of the presentations at NoSQL East demonstrated the virtues of the respective approaches). Now these people are connecting to smart people responsible for building and running web apps, people who are decidedly unwilling to sit around hoping for Oracle or IBM to solve their problems.

In the new phase — which we will cleverly call the “beginning of the middle” — great tech will matter even more than it does now. It won’t be about selling or marketing or any of that. If our numbers are any indication of a larger trend, more people will download and install NoSQL databases in the next month than the combined total of the three months previous. More people in a buying frame of mind will evaluate NoSQL technology not in terms of its coolness but in terms of its ability to solve their real, often expensive problems. The next phase will be rigorous in a way this phase was not. People have created several entirely new ways to store and distribute data. That was the easy part.

Just as much as great tech, the people behind it will matter. That means more calls between us and Dev teams. That means more feature requests considered and, possibly, judiciously, agreed to.

That also means lots of questions answered. People care about support. They care about whether you answer their emails in a timely fashion and are polite. People want to do business with NoSQL. They want to spend money to solve problems. They need to know they are spending it with responsible, responsive, dedicated people.

Earl tweets about it all the time and I happen to agree: any NoSQL success helps all NoSQL players. I also happen to feel that any failure hurts all NoSQL players. As NoSQL rapidly ages into its adolescence, it will either be awkward and painful or exciting and characterized by incredible growth.

When I was a kid on the Navy base in Alameda, my babysitter watched soaps all afternoon, leaving me mostly to my own devices. If I stopped in, I always got roped in to hearing her explain her favorite stories. Most of all she loved how ridiculous they were, though she would never admit this exactly. Instead, adopting an attitude of gleeful incredulity, she would point out this or that attractive young actor and tell me how just a year ago, she was a little baby. “Soap people have to grow up quick, I guess,” was her single (and to her, completely satisfactory) explanation. “If they don’t, they get written out of the story.”



Tony Falco