August 20, 2010
Thank you to those who attended our webinar yesterday. Like before, we’re recapping the questions below for everyone’s sake (in no particular order).
Q: How would solve full text search with the current versions of Riak? One could also take Wriaki as an example as most wikis have some sort of fulltext search functionality.
I recommend using existing fulltext solutions. Solr has matched up well with most of the web applications I have written, and would certainly work for Wriaki as well.
Q: Where in the course of the interaction (shown on slide 18) are you defining the client ID? Don’t you need the client ID and vclock to match between updates?
On slide 42, we talk about “actors” which are essentially client IDs. Using the logged-in user as the client ID can help prevent vclock explosion and is a sensible way of structuring your updates.
August 13, 2010
Documentation is great, but playing with examples can also be a helpful way to tackle steep learning curves. To help you learn about ways of using Riak, we’d like to present “Wriaki”, an example implementation of a wiki that stores its data in Riak.
We invite you to join us for a free webinar on Thursday, August 19 at 2:00PM Eastern Time (UTC-4) about Riak in Action: Wriaki. During the presentation, Bryan Fink will cover:
- Modeling wiki data in the Riak key/value store
- Access patterns using both get/put and map/reduce
- Three strategies that Wriaki uses for dealing with eventual consistency
- how the user interface changes to accommodate Wriaki’s models
The code for Wriaki will be open-source at the time of the presentation. The presentation will last 30 to 45 minutes, with time for questions at the end.
Fill in the Sorry, registration has closed! form below to reserve your seat!
If you cannot attend, the video and slides will be made available afterward in the recap post on the blog.
August 8, 2010
This is a huge day for Basho Technologies, Riak, and our growing community of users.
We are thrilled to announce Basho’s partnership with Joyent to bring our community hosted Riak on Joyent’s Smart platform. With both open source and enterprise versions available, anyone can quickly spin up a Riak cluster and start building applications.
When we first began talking to Jason and David and the rest of the Joyent team early this year, we realized we shared a common vision for the future of infrastructure. The past several months have been spent finalizing the details, and in just a few weeks you’ll be able to go to my.joyent.com and, with a few clicks, purchase and deploy as many nodes of Riak you want, need, and can handle.
Making pre-configured Riak SmartMachines available in the Joyent cloud will enable developers to combine all the benefits of Riak with the proven, advanced hosting platform that businesses like LinkedIn, Gilt, and Backstage rely on every day.
Mark your calendar, because hosted Riak is here!
August 8, 2010
Thank you to those who attended our Rails-oriented webinar yesterday. Like before, we’re recapping the questions below for everyone’s sake (in no particular order).
Q: When you have multiple application servers and Riak nodes, how do you handle “replication lag”?
Most web applications have some element of eventual consistency (or potential inconsistency) in them by their nature. Object and view caches sacrifice immediate consistency for gains in throughput and latency, and hopefully provide a better user experience. With Riak, you can achieve acceptable data freshness by “reading your writes”. That is, use the same read quorum as your write quorum and make sure that the R+W is greater than N. For example, using R=W=DW=2 when N=3 will give a strong assurance of consistency.
Q: I find myself doing
def key; id; end. Is there any easier way to tell Ripple the key?
Currently there is not. However, I’ve found myself using this pattern frequently when I want a meaningful key that is also an attribute. There’s an issue on the tracker just for this feature. In the meantime, you could use two method aliases:
property :email, String, :presence => true
# This forces all attribute methods to be defined
alias_method :key, :email
alias_method :key=, :email=
As long as your property is a string, this should work just fine.
Q: Any tips on how to handle pagination over MapReduce queries?
The challenge with pagination in Riak is that reduce phases are not guaranteed to run only once, but instead are run in parallel as results from the previous phase come in asynchronously, and then followed by a final reduce. So in a sense, you have to treat all invocations of your reduce function as a “re-reduce”. We have plans to allow reduce phases to specify that they should be run only once, but for right now you can get around this limitation.
Reduce phases are always run on the coordinating node, so if you put a reduce phase before the one where you want to perform pagination, you are pretty much guaranteed that the whole result set is going to be available in a single application of the final reduce. A typical combination would be a “sorting” phase followed by a “pagination” phase.
Riak.reduceSlice are two built-in functions that could help accomplish this task.
CAMBRIDGE, MA – August 3, 2010 – Basho Technologies today announced Wikia, Inc. has selected Riak, Basho’s next-generation distributed data store, as the foundation for a new set of global services. Wikia is the 70th largest site on the Internet according to Quantcast and brings millions of people together daily to create and discover engaging content. Wikia selected Riak over traditional databases and other emerging data storage technologies to distribute its data around the world and bring it closer to its global audience.
“Riak has allowed us to do something that was impossible before,” said Artur Bergman, Wikia’s Vice President of Engineering and Operations. “With Riak we can break through the ceiling on performance imposed by traditional database technologies and continue to improve the experience of our users. We invest in technology that benefits Wikia’s growing user base, therefore Riak made perfect sense. Riak is fast, easy to run, and extremely resilient to the failure scenarios anyone with real operational experience knows are all too common.”
Founded in 2008 by former Akamai Technologies (NASDAQ: AKAM) executives and senior engineers, Basho designed Riak to provide the same high availability and rapid scaling properties provided by leading content delivery networks. Applications built with Riak can sustain catastrophic server, data center, and network failures without outages, while avoiding the complexity and expense that characterize applications built using traditional databases.
“Basho is excited to have a respected and forward-looking client like Wikia so readily embrace Riak,” said Earl Galleher, Basho’s Chairman and CEO. “More and more, we see companies reject the limitations of traditional databases like Oracle and MySQL in favor of Riak’s flexibility and ease of use. Riak doesn’t just solve problems for organizations running applications on old database architectures; it frees them to build entirely new classes of applications.”
Wikia intends to deploy a replicated user session service running simultaneously in three data centers in the U.S. and Europe, replacing its current solution which is restricted to a single data center. Mr. Bergman has already contributed a file system adapter to the Riak open source community which will be used in the Wikia production environment.
“We did not set out to build a disruptive technology. We simply wanted to solve a problem faced by anyone running old database technologies,” said Mr. Galleher. “We have only scratched the surface of what Riak can do.”
Wikia, founded by Wikipedia founder Jimmy Wales and Angela Beesley, is the place where millions of passionate people come to discover, create, and share an abundance of information on thousands of topics. Wikia sites are written by community members that are deeply excited and knowledgeable about subjects ranging from video games, television shows, and movies to food, fashion, and environmental sustainability. With over four million pages of content and 150,000 enthusiast communities, Wikia attracts more than 30 million unique global visitors per month and has been listed in the Quantcast top 100 sites on the Internet since early 2009.
About Basho Technologies
Basho Technologies, Inc., founded in January 2008 by a core group of software architects, engineers, and executive leadership from Akamai Technologies, Inc. (Nasdaq:AKAM – News), is headquartered in Cambridge, Massachusetts. Basho produces Riak, a distributed data store that combines extreme fault tolerance, rapid scalability, and ease of use. Designed from the ground up to work with applications that run on the Internet and mobile networks, Riak is particularly well-suited for users of cloud infrastructure such as Amazon’s AWS and Joyent’s Smart platform and is available in both an open source and a paid commercial version. Current customers of Riak include Comcast Corporation, MIG-CAN, and Mochi Media.
CEO, Basho Technologies, Inc.
July 30, 2010
What is riak_core?
riak_core is a single OTP application which provides all the services necessary to write a modern, well-behaved distributed application.
riak_core began as part of Riak. Since the code was generally useful in building all kinds of distributed applications we decided to refactor and separate the core bits into their own codebase to make it easier to use.
Distributed systems are complex and some of that complexity shows in the amount of features available in
riak_core. Rather than dive deeply into code, I’m going to separate the features into broad categories and give an overview of each.
Node Liveness & Membership
riak_core_node_watcher is the process responsible for tracking the status of nodes within a riak_core cluster. It uses
net_kernel to efficiently monitor many nodes.
riak_core_node_watcher also has the capability to take a node out of the cluster programmatically. This is useful in situations where a brief node outage is necessary but you don’t want to stop the server software completely.
riak_core_node_watcher also provides an API for advertising and locating services around the cluster. This is useful in clusters where nodes provide a specialized service, like a CUDA compute node, which is used by other nodes in the cluster.
riak_core_node_watch_events cooperates with
riak_core_node_watcher to generate events based on node activity, i.e. joining or leaving the cluster, etc. Interested parties can register callback functions which will be called as events occur.
Partitioning & Distributing Work
riak_core uses a master/worker configuration on each node to manage the execution of work units. Consistent hashing is used to determine which target node(s) to send the request and the master process on each node farms out the request to the actual workers.
riak_core calls worker processes
vnodes. The coordinating process is the
The partitioning and distribution logic inside
riak_core also handles hinted handoff when required. Hinted handoff occurs as a result of a node failure or outage. In order to assure availability, most clustered systems will use operational nodes in place of down nodes. When the down node comes back the cluster needs to migrate the data from its temporary home on the substitute nodes to the data’s permanent home on the restored node. This process is called hinted handoff and is managed by components inside
riak_core also handles migrating partitions to new nodes when they join the cluster such that all work continues to be evenly partitioned to all cluster members.
riak_core_vnode_master starts all the worker vnodes on a given node and routes requests to the vnodes as the cluster runs.
riak_core_vnode is an OTP behavior wrapping all the boilerplate logic required to implement a vnode. Application-specific vnodes need to implement a handful of callback functions in order to participate in handoff sessions and receive work units from the master.
A riak_core cluster stores global state in a ring structure. The state information is transferred between nodes in the cluster in a controlled manner to keep all cluster members in sync. This process is referred to as “gossiping”.
riak_core_ring is the module used to create and manipulate the ring state data shared by all nodes in the cluster. Ring state data includes items like partition ownership and cluster-specific ring metadata. Riak KV stores bucket metadata in the ring metadata, for example.
riak_core_ring_manager manages the cluster ring for a node. It is the main entry point for application code accessing the ring, via
riak_core_ring_manager:get_my_ring/1, and also keeps a persistent snapshot of the ring in sync with the current ring state.
riak_core_gossip manages the ring gossip process and insures the ring is generally consistent across the cluster.
What’s the plan?
Over the next several months I’m going to cover the process of building a real application in a series of posts to this blog where each post covers some aspect of system building with
riak_core. All of the source to the application will be published under the Apache2 licensed and shared via a public repo on github.
And what type of application will we build? Since the goal of this series is to illustrate how to build distributed systems using
riak_core and also satisfy my own technical curiosity I’ve decided to build a distributed graph database. A graph database should provide enough use cases to really exercise
riak_core while at the same time not obscuring the core learning experience in tons of complexity.
Thanks to Sean Cribbs and Andy Gross for providing helpful review and feedback.
July 29, 2010
Ruby on Rails is a powerful web framework that focuses on developer productivity. Riak is a friendly key value store that is simple, flexible and scalable. Put them together and you have lots of exciting possibilities!
We invite you to join us for a free webinar on Thursday, August 5 at 2:00PM Eastern Time (UTC-4) to talk about Riak with Rails. In this hands-on webinar, we’ll discuss:
- Setting up a new Rails 3 project for Riak
- Storing, retrieving, manipulating key-value data from Ruby
- Issuing map-reduce queries
- Creating rich document models with Ripple
- Using Riak as a distributed cache and session store
The presentation will last 30 to 45 minutes, with time for questions at the end.
Fill in the
form below if you want to get started building Rails applications on top of Riak!
Sorry, registration is closed.
July 21, 2010
A few members of the Basho Team are at OSCON all week. We are here to take part in the amazing talks and tutorials, but also to talk to Riak users and community members.
Yesterday I had the opportunity to have a brief chat with Andrew Harvey, a developer who hails from Sydney, Australia and works for a startup called Lexer. They are building some awesome applications around brand monitoring and analytics, and Riak is helping in that effort.
In this short clip, Andrew gives me the scoop on Lexer and shares a few details around why and how they are using Riak (and MySQL) at Lexer.
(Deepest apologies for the shakiness. I forgot the Tripod.)
July 19, 2010
Basho is growing. Fast. We are adding customers and users at a frenetic pace, and with this growth comes expansion in both team and locations. As some of you may have noticed, the Basho Team is not only becoming larger but more distributed. We now have people in six states scattered across four time zones pushing code and interacting with clients everyday.
First Order of Business
To bolster this growth and expansion, we did what any self-respecting tech startup would do: we opened an office in San Francisco. Several members of the Basho Team recently moved into a space at 795 Folsom, a cozy little spot a mere five floors below Twitter. (Proximity to the Nest was a requirement when evaluating office space.) We are calling it “Basho West.” There are four of us here, and we are settling in quite nicely.
If you are in the area and want to talk Riak, Basho, open source, coffee, etc., stop in and pay us a visit any time. Seriously. If you walk through the door of Suite 1028 with a Mac Book in hand and have a question about how to model your data in Riak, we’ll get out the whiteboard and help you out.
Second Order of Business
To make an immediate impact in the Bay Area, we thought it would be a great idea to get the first regularly scheduled Riak Meetup off the ground. We heard a rumor that there were a lot of people using or interested in databases out here, so we feel obliged to join the conversation. Here is the link to the San Francisco Riak Meetup group. If you’re in the Bay Area and want to meet with other like-minded developers and technologists to discuss Riak (and other database technologies) in every possible capacity, please join us.
Third Order of Business
Pop quiz: When did Basho Technologies open source Riak? We asked ourselves this the other day. As far we can tell, it was sometime during the first week and a half of August last year. “Huh,” we thought. “Wouldn’t it be great to have a little gathering to commemorate this event?” It sure would, so that’s what we are doing.
I mentioned above that we are starting a regularly scheduled Riak Meetup. To us, it made perfect sense to combine the inaugural Meetup with the event to celebrate Riak’s One Year Anniversary of being a completely open source technology.
The date of this gathering is Monday, August 9th. The exact time and location still needs to be solidified. We’ll be announcing that within the next few days. But put it on your calendar now, as you will not want to miss this. In addition to food, drink, and exceptional overall technical discussion and fireworks, here is what you can expect:
- A talk from Dr. Eric Brewer, Basho Board Member and Father of the CAP Theorem
- A few words from the team at Mochi Media about their experiences running Riak in production
- A short talk from Basho’s VP of Engineering, Andy Gross, on the state of Riak and the near term road map
If you have any other suggestions about what you would like to see at this event, just leave us a message or an idea on the Meetup page linked above.
- Come visit the new Basho Office at 795 Folsom, Suite 1028
- Join the Riak Meetup Group
- Come be a part of the Riak One Year Anniversary Celebration
And stay tuned, because things are only going to get more exciting from here.
July 15, 2010
Map-Reduce is a flexible and powerful alternative to declarative query languages like SQL that takes advantage of Riak’s distributed architecture. However, it requires a whole new way of thinking about how to collect, process, and report your data, and is tightly coupled to how your data is stored in Riak.
We invite you to join us for a free webinar on Thursday, July 22 at 2:00PM Eastern Time (UTC-4) to talk about Map-Reduce Querying in Riak. We’ll discuss:
- How Riak’s Map-Reduce differs from other systems and query languages
- How to construct and submit Map-Reduce queries
- Filtering, extracting, transforming, aggregating, and sorting data
- Understanding the efficiency of various types of queries
- Building and deploying reusable Map-Reduce function libraries
We’ll cover the above topics in conjunction with practical examples from sample applications. The presentation will last 30 to 45 minutes, with time for questions at the end.
Fill in the form below if you want to get started building applications with Map/Reduce on top of Riak!
Sorry, registration has closed!