Learn the basics of Riak Cloud Storage in 15 minutes in this video from the Citrix CloudPlatform partner program. We discuss the building blocks of cloud services platforms and enterprise requirements for cloud storage. Then we walk through the properties, architecture, interfaces and operations of Riak CS.
In 2009, mobile marketing and advertising technology provider Velti had a good (but challenging) problem on their hands. Their technology, which allows people to interact with their TV by voting, giving feedback, participating in contests, etc., had taken off. It had been adopted by nearly all of the TV broadcasters in the UK and three of the UK’s five mobile operators. As more customers began using their technology, Velti saw quick growth in (inherently spikey) traffic. Their 2003-era .NET, SQLServer platform was becoming a concern.
Because the team at Velti had been working with Erlang (what Riak is written in), in 2010 they brought in Erlang Solutions to help them architect their next generation platform. Riak was chosen for the database, and an early version of Multi-Data Center replication in Riak Enterprise was used to build two geographically separated sites to minimize potential catastrophic outages.
Velti’s new mGageTM platform is now running on 18 servers across two data centers (nine nodes in each data center), with each server running both Erlang applications as well as Riak. We’re pleased to pass along reports that the platform is redundant, queue behavior has significantly improved (especially for large queue populations), and that after Velti moved to Riak 1.2, they saw noticeable disk space utilization thanks to improvements in merge management.
Markus Kern, VP Technology at Velti summarizes, “We operate a 24/7 service for over 140 customers. We cannot afford a single minute of downtime. Riak gives us the ability to meet and exceed our requirements for scale, data, durability, and availability.” Woot!
With a growing community understanding of distributed systems architectures, where is the field evolving? How are Riak and other Dynamo-inspired databases handling complex data structures and meeting demands for stronger consistency and more queriability? This blog highlights three talks from last month’s RICON that tackle these questions.
Advancing Distributed Systems – Eric Brewer
In this keynote talk, Dr. Eric Brewer, author of a theorem that helped kick off the NoSQL movement, talks about the challenges facing distributed systems today. Beginning with some historical context–“SQL vs. NoSQL is not really a new religious war, it’s actually the latest round of a very old religious war”– Dr. Brewer walks us through the advantages and disadvantages of top-down (relational) and bottom-up (NoSQL) worldviews, his work at Google, and his thoughts on where next generation databases are headed.
Bringing Consistency to Riak – Joseph Blomstedt
With regard to the CAP Theorem, Riak is an eventually-consistent database with AP semantics. But, this may soon change. In this talk, Basho engineer Joseph Blomstedt presents, for the first time, on-going R&D at Basho to add true strongly-consistent/CP semantics to Riak.
Data Structures in Riak – Sean Cribbs and Russell Brown
Since the beginning, Riak has supported high write-availability using Dynamo-style multi-valued keys – also known as conflicts or siblings. The tradeoff for this type of availability is that the application must include logic to resolve conflicting updates. This ad hoc resolution strategy is error-prone and can result in surprising anomalies. In this talk, Basho engineers Sean Cribbs and Russel Brown present recent work done to address these issues by adding convergent data structures to Riak.
Legacy RDBMS systems offered mature monitoring capabilities that usually gave operators a clear view of how their databases were (or weren’t) performing. Emerging distributed systems introduce new levels of complexity, presenting new problems in monitoring and diagnosis. In this blog we highlight two talks given at last month’s RICON which shed light on this problem and offer some interesting solutions.
Next Generation Monitoring of Large Scale Riak Applications
In this talk, Theo Schlossnage, founder of OmniTI, talks about moving beyond standard monitoring metrics (average, mean, 95th percentile, 99th percentile, etc.), and advocates for more sophisticated methods, namely histograms and new visualization techniques. He illustrates this with some interesting real world examples in which metrics such as average response time have little meaning in the face of real world distributions which are often multi-modal and rapidly evolving.
Modern Radiology for Distributed Systems
In this talk, Boundary engineer Dietrich Featherston uses radiological imaging as a metaphor to explore the challenges of monitoring distributed systems –Boundary uses Riak to store high-resolution network data for its analysis engine. In this metaphor, if we just look at metrics pulled from individual hosts (CPU usage, memory usage, etc.), we can see diseased “cells”, but ignore the whole organism. We react to problems, instead of preventing them. To illustrate, Dietrich walks through a series of case studies highlighting new, “context aware”, non-invasive monitoring techniques.
Today I’m happy to announce a new release of riak-js, the Riak client for Node.js. This release breathes new life back into riak-js. For various reasons, development on riak-js had been dormant for quite a while, but that has changed for the better and we’re committed to making it a viable client library for production applications.
During the transition, riak-js got a new home as well, and can now be found on mostlyserious/riak-js. Please make sure to send all issues an pull requests to this repository.
Version 0.9.0 of riak-js was shipped today and can be installed from npmjs.org:
While the new release is mostly backwards-compatible, it does come with some changes that I deemed worthwhile to make before hitting the big 1.0.
Most notably, the functions for using MapReduce and Riak Search have moved into their own namespace. That brings riak-js more on par with other libraries in making both functionalities a bit more separate from normal client operations.
To run MapReduce, you now use something like the following example:
The same goes for Riak Search, which is now fully supported, including adding, removing and querying documents directly from a search index:
There are other minor changes, e.g. accessing bucket properties:
Several Basho team members will be presenting on distributed systems topics at QCon San Francisco.
SAN FRANCISCO, CA – November 7, 2012 – Attending QCon International Software Development Conference this week in San Francisco? We’d love to meet up and talk to you about Riak! You can catch us in the exhibitor’s hall all week, or at the welcome party taking place after the talks Wednesday, November 7 at Thirsty Bear. Additionally, several Basho team members will be presenting on distributed systems topics. Check out the talk synopsis below and hope to see you there.
October 2012 marks the five year anniversary of Amazon’s seminal Dynamo paper, which inspired most of the NoSQL databases that appeared shortly after its publication, including Riak. In this session, Andy will reflect on five years of involvement with Riak and distributed databases and discuss what went right, what went wrong, and what the next five years may hold for Riak as we outgrow our Dynamo roots.
A number of years ago, Eric Brewer, father of the CAP theorem, coined an architectural style of loosely-coupled distributed systems “BASE”, meaning, “Basically Available, Soft-state, and Eventually-consistent”. Clearly he meant this as a counterpoint to the “ACID” properties of traditional database systems. BASE systems choose to remain available to operations, sacrificing strict synchronization. While developers are very comfortable with the convenience of ACID, eventual consistency can be frightening, unfamiliar territory.
This talk will dive into the design of eventually consistent systems, touching on theory and practice. We’ll see why EC doesn’t mean “inconsistent” but is actually a different kind of consistency, with different tradeoffs. These new skills should help developers know when to embrace eventually-consistent solutions instead of fearing them.
The Dynamo paper, released by Amazon five years ago, laid out a set of technical “themes” for highly available, fault-tolerant distributed systems. Since then, numerous NoSQL products have been built on its core principles. These “variations,” along with recent advances in research, represent both a fascinating study in technical evolution and the forefront of the non-relational world. In this talk, we’ll cover the foundations of Dynamo – consistent hashing, vector clocks, hinted handoff, gossip protocol – advances in each area, and how querying and application development has changed as a result of them.
Thanks to the urging of DeadZen, we now have a dedicated mailing list for Riak Core. You can subscribe here.
For those of you not familiar with Riak Core, it’s more-or-less the distributed systems infrastructure that makes up, well, the core of how Riak distributes data and scales. For some introductory reading (that’s not pure code), there’s an old but still valuable blog post on the Basho Blog that’s well worth your time.
Why a separate list? Because Core is a powerful library that can be (and is being) used to build applications distinct of the other OTP apps (kv, search, pipe, etc.) that make up Riak. I know of at least 10 companies that have Riak Core apps in production, and I’m sure there are many more just waiting to share their use cases with the
world (hint hint…). Plenty of Riak issues are Core-related, and these should still be handled on the Riak Mailing List. However, as Core gets more use, there are questions, comments, and concerns that will be specific to Core, so a separate forum for these makes sense. There will be some overlap, too, and Basho will take responsibility for cross-posting when necessary.
We’ve long been convinced of the power of Core, but it has received less tooling (docs, tutorials, etc.) due to lack of engineering time. This is a great first step to helping put more community power and focus behind Core.
Erlang was created to run on a variety of systems. Riak (written in Erlang) was created as a fault-tolerant distributed datastore, able to run on commodity hardware. Raspberry Pi is the culmination of these two points, brought to an absurd level: an embedded(ish), very inexpensive ($35) commodity computer. I thought it might be fun to create a Riak cluster on a set of Pis I had lying around.
Here’s what you’ll need to build your own RiakPi cluster:
N Raspberry Pis
N SD card, 4+ GB
N CAT5 cables
N 5V, 700-1200 mA micro USB powersource (the kind used by many cellphones)
1 really cheap, non-fancy USB keyboard
1 monitor/tv with HDMI input
1 HDMI cable
1 Hub, Switch, or Router
1 Laptop/Desktop with an SD card slot
If you only have 1 Pi, you can still install Riak, but clearly not create a cluster. For the rest of this post, I’ll assume N=3.
It may seem like a lot of parts. However, except for the RPis, you can find most of this stuff lying around RadioShack, or your local reclaimed electronics equipment store, fairly inexpensively.
Plug your shiny new SD card into your RPi. In fact, why not just plug everything in? Plug the power in last, or you’ll be racing against the boot up process trying to connect devices.
Simply plugging in the Pi will turn it on.
It should look something like this super hip Instagram photo.
Step 3: Boot up and setup
If everything booted up, you should see a Blue Screen of Life.
The first task I recommend is configuring your keyboard. Since Raspberry Pi is from the UK, Raspbian is set up with UK defaults (which, as an American, this came as quite a shock since everything is usually set up for me all the time).
The next thing you’ll want to do is expand the root partition to take up the full SD space. If you really wanted to make a production ready server you’d probably want to make a separate partition to store the Riak data… but if you wanted a production system you wouldn’t be installing on a Raspberry Pi anyway.
If you like, try upgrading raspi-config. This isn’t necessary, but I like to keep current. You’ll certainly need an internet connection to do so.
Then reboot. You can just unplug it. At this point I wouldn’t worry about file corruption. If the thought bothers you, you can run sudo shutdown now, then unplug.
When the server restarts, just select “finish” if the config screen launches again.
Step 4: Install esl-erlang (R15B02, Raspian)
At this point, get ready to start waiting. RPi isn’t exactly a big iron machine, and SD cards are not fast storage, so the combination can be maddeningly sluggish. You might want to have a book handy.
Now that all of that housekeeping is out of the way, it’s time to get cracking on some Riak. The first thing you’ll need, however, is Erlang. Thankfully, the awesome Erlang Solutions folks created a Raspbi distro, which was way easier than the first two weeks when I tried to create my own build. So unless you’re masochistic, I suggest using theirs.
First, add the following line to your /etc/apt/sources.list file. Luckily, vi is installed.
sudo vi /etc/apt/sources.list
deb http://binaries.erlang-solutions.com/debian wheezy contrib
Then add the Erlang Solutions public key for apt-secure.
Once erlang is installed, type erl to test. I found that I received a segfault unless I restarted once after install.
If you are prompted for a password, the login/password is pi/raspberry.
Step 5: Install Riak from Source
With Erlang installed, now we move onto Riak proper. First thing is to download and extract Riak 1.2.1.
curl -O http://downloads.basho.com.s3-website-us-east-1.amazonaws.com/riak/CURRENT/riak-1.2.1.tar.gz
tar zxvf riak-1.2.1.tar.gz
Before we can actually install Riak using Rebar, we’ll first need to install git. Rebar uses git to download Riak’s dependencies.
sudo apt-get install git
With git in place, run make a release version and let Rebar do its stuff.
Once your Riak release is installed, start up riak:
You can test that it’s started and receiving requests by curling /ping.
$ curl http://localhost:8098/ping
$ curl -XPUT http://localhost:8098/riak/hello/fr -d ‘Allo’
$ curl http://localhost:8098/riak/hello/fr
It’s worth nothing that the Raspberry Pi only has 256M of RAM. So feel free to tweak some of the default settings in etc/app.config.
Step 6: Rinse and Repeat
At this point you should have Riak installed on a Raspberry Pi. To install on the remaining two, you can either clone the SD card image to SD cards, or you can install Riak from scratch onto those cards following the instructions above. Copying SD images can be slow, so I couldn’t quite recommend one over the other.
Build a Cluster
Once you have three working cards, let’s network the Pis into a Riak cluster.
Step 6: A Little Network
Before tackling this part, I’d stop Riak. We’ll fire it up when our network is functional.
You can plug your Pis into a router. They’re set up for DHCP by default so you should be good to go. You can largely skip this section.
However, if you’re cheap like me, you can plug them into a switch or hub, and do a bit of configuration.
Change the /etc/network/interfaces file to have the following settings:
iface lo inet loopback
iface eth0 inet static
Then restart the network.
sudo /etc/init.d/networking restart
You should see your network lights flash off then on.
If you run ifconfig, your eth0 and lo values should be what you expect.
If for some reason that didn’t work, try.
sudo ifconfig eth0 down
sudo ifconfig eth0 up
Then plug your keyboard and monitor into your other Pis, and repeat for the remaining two. But this time, give each card its own successive IP address (192.168.10.11, 192.168.10.12).
You should now be able to ping the other cards.
The LNK lights on your cards should blink for each ping.
There are more creative ways to configure a network, obviously, but this was fine for my three little cards.
Step 7: Make Riak Nodes
Although each RPi has Riak installed, they are not configured with the new network settings which allow the Erlang VMs to communicate with each other.
Change to your riak install directory ($RIAK_HOME/rel/riak).
Replace all 127.0.0.1 with 192.168.10.10, or whichever your Pi’s IP.
Replace all 127.0.0.1 with 192.168.10.10/11/12.
Start riak with bin/riak start and check that it’s running with /bin/riak pong. If you have trouble getting Riak to start, you may have better luck by deleting your data directory.
Step 8: Cluster Time
Now each of your RPis is an official Riak node. It’s time to build a cluster!
Whatever RPi node you happen to be connected to, choose the two other nodes to join. Since I’m connected to 192.168.10.12, I typed the following:
You should expect the new key to be added. If you’d like to see my results in bad home-movie form, the video is below.
Basho Names Gregory M. Collins, A Leading Enterprise Software Operating Executive, President and CEO to Lead Next Phase of Basho’s Growth.
CAMBRIDGE, MA – October 31, 2012 – Basho Technologies, the worldwide leader in distributed database and cloud storage software, today reported that it has named Gregory M. Collins, a highly experienced strategy and business development executive and current Basho Board member, as President and Chief Executive Officer, while the Board has accepted the resignation of Donald Rippert from those positions and the board.
Earl Galleher, Chairman of the Board of Directors, said, “We appreciate Don’s efforts in helping to transition the Company from its start-up status into a growing and established participant in the highly-attractive cloud storage and electronic data systems markets. Recognition for Basho’s market potential is demonstrated by the continued growth in its customer bookings, its successful attraction of investor capital, and its strategic expansion in the U.S., European, and Asian marketplaces, including through Basho’s strategic relationship with Yahoo! Japan and its subsidiary IDC Frontier. In addition, Basho has developed the necessary operating infrastructure to capitalize on the fast-growing global demand for its key Riak™ and Riak CS products, following the opening of its Tokyo and London offices earlier this year to support the growing adoption of its products around the world.
“We feel that now is the right time to move the Company to an executive leadership that has strong operating and strategic development expertise, which will help Basho build out its organizational capabilities to both finalize customer wins in the Company’s pipeline and to strategically enhance the value of the Basho enterprise as the Company expands its market potential. The Board is confident that Greg’s expertise in creating and implementing business development strategies for fast-growing organizations in software, data services and applied technology across markets will provide the Company with the strategic leadership Basho needs as it moves forward in this next stage of its growth,” added Mr. Galleher.
Collins brings deep development and operational experience in the enterprise software and data solutions markets. As a Principal at Cape Fear Advisors, Collins is engaged by CEOs to assess and resolve strategic, operational, finance, leadership and business development opportunities. Among other notable successes, he grew revenue 50% for a software and services client by enabling them to capture business from larger prospects and clients and sustained 30%+ topline growth for a data services client by identifying emerging segments and building new sales tools and approaches. Prior to joining Cape Fear, he was a senior executive at Fortune 500 software provider Reynolds and Reynolds, where he repositioned the Company in higher growth markets and sustained and accelerated the Company’s growth and value realization track record through a combination of service extensions, acquisitions, cost management programs and global expansion.
Collins stated, “I am excited to lead the Company forward in its development. Basho is uniquely positioned, in terms of its products, services, and its business relationships, to be one of the premier companies in distributed database and cloud data storage. Its products and services already are reshaping the marketplace to the benefit of customers and the interest in doing business with Basho is spreading rapidly and internationally. My focus initially will be on working with Basho’s talented management team and exceptional employees to facilitate Basho’s ability to execute on its customer bookings in a manner that achieves maximum results for both Basho and our customers, and in so doing advance Basho to be universally regarded as the best-in-class company in the cloud storage and electronic data systems markets. The Board, the broader management team, and I are excited about Basho’s business successes, the appeal of the marketplace we are helping to develop, and Basho’s future long term potential as we continue to build on our business momentum and enhance our products to capitalize on the tremendous market opportunity in cloud data storage.”
About Basho Technologies
Basho Technologies is the leader in highly-available, distributed database technologies used to power scalable, data-intensive Web, mobile, and e-commerce applications and large cloud computing platforms. Basho customers, including fast-growing Web businesses and large Fortune 500 enterprises, use Riak™ to implement content delivery platforms and global session stores, to aggregate large amounts of data for logging, search, and analytics, to manage, store and stream unstructured data, and to build scalable cloud computing platforms.
Riak is available open source for download at http://wiki.basho.com/Riak.html. Riak EnterpriseDS is available with advanced replication, services and 24/7 support. Riak CS enables multi-tenant object storage with advanced reporting and an Amazon S3 compatible API. For more information visit www.basho.com or follow us on Twitter at www.twitter.com/basho.
Robert Siegfried / Lyndsey Estin
Kekst and Company
Google’s LevelDB has proven very versatile within Riak — LevelDB is implemented in Riak as eleveldb, an Erlang wrapper around levelDB. But Google’s target environment was much different than the large data environment of Riak’s users. Riak 1.2 contains the first wave of performance tuning for large data. These changes improve overall throughput and eliminate most instances where levelDB would hang for a few seconds trying to catch up. The new release also contains a fix for an infinite loop compaction condition, a bloom filter that greatly reduces time searching for non-existent keys, and several bug fixes. This blog details these improvements and also gives some internal benchmark results obtained using basho_bench, Basho’s open source benchmarking tool.
Stalls: In Riak 1.1, individual vnodes in Riak (one levelDB database) could have long pauses before responding to individual get / put calls. Several stall sources were identified and corrected. On a test server, LevelDB in 1.1 saw stalls of 10 to 90 seconds every 3 to 5 minutes. In Riak 1.2, levelDB sometimes sees one stall every 2 hours for 10 to 30 seconds.
Throughput: While impacted by stalls, throughput is an independent code and tuning issue. The fundamental change made was to increase all on-disk file sizes to minimize the number of file opens and reduce the number of compactions. LevelDB in Riak 1.1 had a throughput of ~400 operations per second on a given server. These changes raised throughput to ~2,000 operations per second.
Infinite loop during compaction: In 1.1, the background compaction would get caught in an infinite loop if it encountered a file with a corrupt data block. The previous solution was to stop the node, manually execute “recovery”, then restart the node. The entire file (and all its data) was removed from the data store and copied to the “lost” directory during the recovery. Riak 1.2 creates one file, BLOCKS.bad, in the “lost” directory. The levelDB code then automatically removes the corrupted block from compaction processing and copies it to this file. The compaction then continues processing the remaining data in the file (and moves along without going into an infinite loop).
Merge of levelDB bloom filter code: Google has created a bloom filter to help levelDB more quickly identify keys that do not exist in the data store. The bloom filter code is merged into this release. While incrementally beneficial in its own right, the bloom filter enables changes to the file / level structure which dramatically improves overall throughput.
app.config eleveldb options: in Riak 1.1, most parameters set in app.config for the levelDB layer were never passed. This is corrected. Users should assume previous parameter tests / experiments to be invalid.
The graphs below illustrate levelDB’s improvements in throughput and maximum latency. Test data was obtained using basho_bench, Basho’s open source benchmarking tool. Raw data and configuration files can be downloaded here. In the benchmark presented, levelDB preloads a database with 10 million sequentially ordered keys.
As can be seen, levelDB 1.1 stalls regularly, whereas 1.2 seldom stalls due to stall management improvements. We can also see that levelDB in 1.2 has a higher ingest rate (we were able to input 10 million records in 44 minutes compared to 106 minutes in 1.1)
Throughput in levelDB 1.1
Throughput in levelDB 1.2
Maximum latency in levelDB 1.1
Maximum latency in levelDB 1.2
We have already identified further performance tuning for future work, including bloom filter modification and removing redundancy (bloat) during memory to level-0 file creation. Expect another wave of performance tuning in subsequent point releases and major releases.
Data backup: Theoretically there is no need to perform data backup on levelDB since Riak duplicates all data across several nodes. But many users have suggested they would still sleep better if there was a means to perform a direct backup by node/vnode anyway. Backups during live operation are a planned, next feature.
Infinite loops: Riak 1.2 contains fixes for a couple of the most common cases where compactions could enter infinite loops when the state of files on the disk does not match that of LevelDB’s internal state. However, there are still other, less frequent cases that can still cause infinite loops. These less frequent cases are high on the future work list.
Error correction: LevelDB has methods to repair and restore damaged vnodes. The time cost of executing a repair can be huge. The repair time is already better with release 1.2 (in one case the time was reduced from 6 weeks … really … to eleven hours). We already have a design waiting for programming resources that will further speed repair time.