December 2, 2013
A few weeks ago, at AWS re:Invent 2013, The Weather Company announced their new IT platform, which focuses on next generation forecasting using big data. To build this platform, they required an architecture that was both flexible and reliable, and selected Riak to achieve that. Riak underpins the new IT platform and is used to store a variety of data from satellites, radars, forecast models, users, and weather stations worldwide.
At re:Invent, the Vice President of Enterprise Data at the Weather Company, Sathish Gaddipati, spoke about The Weather Company’s overall IT transformation in a talk titled, “How the Weather Company Monetizes Weather, the Original Big Data Problem.” If you missed Sathish’s session at re:Invent, the entire talk is now available on the AWS Youtube Channel and can be watched below.
This talk provides more details on The Weather Company’s architecture, technology choices, performance results, and business benefits realized from this IT transformation. It also discusses how the application of these technologies can help keep people safe and help businesses plan and become more profitable, thanks to the latest intersection of consumer behavior and weather forecasting and reporting.
For additional details on why The Weather Company selected Riak over Cassandra, MongoDB, and Hadoop, check out Doug Henschen’s InformationWeek article “Big Data Reshapes Weather Channel Predictions.”
November 7, 2013
AWS re:Invent 2013 is the second annual Amazon Web Services’ global customer and partner conference. It is the largest gathering of developers and technical leaders from the AWS community and provides four days of sessions, training bootcamps, hands-on labs, and a Hackathon.
Sathish Gaddipati, The Weather Company VP of Data Enterprise, will be presenting at re:Invent on Thursday, November 14th at 5:30pm. His talk, “How The Weather Company Monetizes Weather, the Original Big Data Challenge,” will look at the transformation of the most widely distributed cable TV network in the United States (building on one of the world’s most visited digital properties) to create a world class Big Data platform. As part of this transformation, The Weather Company has selected Basho’s Riak to underpin their new IT platform. Gaddipati will also discuss the architecture, technology choices (including how and why they selected Riak), performance results, and business benefits realized as part of their use of AWS services to host an exciting set of weather.com solutions and generate new revenue streams.
Basho is also a gold sponsor of re:Invent and will have a booth set up to answer any questions about Riak or Riak CS. Be sure and stop by to chat and grab some great Basho swag.
November 6, 2013
RICON West may be over, but we still have a few major conferences and events to wrap up before the year is up. Here are some of the highlights of where we’ll be in November.
OpenStack Summit: The OpenStack Summit will bring together key technical minds to discuss the future of cloud computing. This Summit will feature case studies, workshops, and technical sessions for both cloud operators and developers. Basho engineer, Eric Redmond, will be on-site to answer any questions about Riak CS and OpenStack. The OpenStack Summit will take place in Hong Kong from November 5-8th.
Build a Cloud Days: Build a Cloud Days helps attendees deploy a cloud computing environment using CloudStack and other cloud infrastructure tools. Basho consulting engineer, John Burwell, will present on private cloud design principles and best practices. It will take place on November 8th in Washington DC.
QCon: Basho is a proud sponsor of QCon San Francisco, which will take place November 11-13th. Basho technical evangelist, Tom Santero, will discuss the benefits and challenges of working on distributed teams in his talk, “Making Virtual and Remote Teams Shine.” Basho will also have a booth set up to answer any questions about Riak or Riak CS.
AWS re:Invent: Basho is a gold sponsor of AWS re:Invent, the largest gathering of developers and technical leaders from the AWS community. Bryson Koehler, CIO of The Weather Company, will be speaking about how The Weather Company uses Riak in his session, “How the Weather Company Monetizes Weather, the Original Big Data Challenge.” Basho will also be available at the booth to discuss Riak or Riak CS. AWS re:Invent will take place in Las Vegas from November 12-15th.
For a full list of where we’ll be for the rest of the year, check out the Events Page.
June 27, 2013
Today, we are excited to share a recent whitepaper released by the Amazon team entitled, “NoSQL Database in the Cloud: Riak on AWS.” This paper provides technical guidance on running Riak on the Amazon platform, including an overview of:
- Basic Installation
- Riak Architecture and Scale
- Operational Considerations (including sizing and configuration)
- AWS specific security configuration
- A discussion of Replication (as enabled by Riak Enterprise)
Given the number of Riak users (both open source and enterprise) who leverage public cloud environments, either as a part of their infrastructure or as the foundation of it, Basho will continue to invest in partnerships that provide deployment choice and deployment ease. Whether it’s for a hybrid cloud model – used to address burst capacity, tenancy/data locality, and proof of concept needs – or for an investment solely in public cloud, Riak will provide the operational simplicity and scalability required for your critical data.
For more information about deploying Riak on AWS, check out our posts about the Riak AMI and our other deployment options, including automated scripts and manual installation. You can also find more information about what to consider when installing Riak on AWS in our documentation.
June 5, 2013
To help make Riak even more accessible, we have partnered with a number of different hosting providers, consulting services, system integrators, and OEMs to help you better use Riak. You can check out all of our partners at the Partnerships Page, which highlights how we are collaborating. Below is a look at some of our wonderful partners.
For those of you in need of a hosting provider for Riak, Basho is partnered with a number of great companies to help get you deployed quickly. Partners include: Amazon Web Services, Windows Azure, Joyent, SoftLayer (which was recently acquired by IBM and also offers hosting for Riak Enterprise), and Engine Yard – which is offering 500 free hours when you sign up.
Some companies that are using Riak and Riak CS to power applications or other product offerings include Datapipe and Yahoo! Japan subsidiary, IDC Frontier. ePlus and Trifork act as resellers of Riak to help expand our global reach.
Finally, Gazzang helps to ensure that your Riak environment is secure and meets all regulations related to sensitive information.
May 6, 2013
The free Riak AMI available on the AWS Marketplace has been updated to the latest version, Riak 1.3.1.
In Riak 1.3, we introduced:
- Active Anti-Entropy
- Updates to Riak Control
- Expanded IPv6 support
- Improved MapReduce
- Simplified Log Management
Riak 1.3.1 includes all these features with some additional changes enumerated in the release notes.
For those of you currently using Riak on AWS, or interested in testing Riak on AWS, the AMI makes installation and configuration much easier. We see open source and Riak Enterprise users leverage AWS both as their primary infrastructure and to support hybrid implementations.
Installation instructions for the AMI are available on in our docs.
April 17, 2013
This post looks at five commonly asked questions about Riak. For more questions and answers, check out our Riak FAQ.
What hardware should I use with Riak?
Riak is designed to be run on commodity hardware and is run in production on a variety of different server types on both private and public infrastructure. However, there are several key considerations when choosing the right infrastructure for your Riak deployment.
RAM is one of the most important factors – RAM availability directly affects what Riak backend you should use (see question below), and is also required for complex MapReduce queries. In terms of disk space, Riak automatically replicates data according to a configurable n_val. A bucket-level property that defaults to 3, n_val determines how many copies of each object will be stored, and provides the inherent redundancy underlying Riak’s fault-tolerance and high availability. Your hardware choice should take into consideration how many objects you plan to store and the replication factor, however, Riak is designed for horizontal scale and lets you easily add capacity by joining additional nodes to your cluster. Additional factors that might affect choice of hardware include IO capacity, especially for heavy write loads, and intra-cluster bandwidth. For additional factors in capacity planning, check out our documentation on cluster capacity planning.
Riak is explicitly supported on several cloud infrastructure providers. Basho provides free Riak AMIs for use on AWS. We recommend using large, extra large, and cluster compute instance types on Amazon EC2 for optimal performance. Learn more in our documentation on performance tuning for AWS. Engine Yard provides hosted Riak solutions, and we also offer virtual machine images for the Microsoft VM Depot.
What backend is best for my application?
Riak offers several different storage backends to support use cases with different operational profiles. Bitcask and LevelDB are the most commonly used backends.
Bitcask was developed in-house at Basho to offer extremely fast read/write performance and high throughput. Bitcask is the default storage engine for Riak and ships with it. Bitcask uses an in-memory hash-table of all keys you write to Riak, which points directly to the on-disk location of the value. The direct lookup from memory means Bitcask never uses more than one disk seek to read data. Writes are also very fast with Bitcask’s write-once, append-only design. Bitcask also offers benefits like easier backups and fast crash recovery. The inherent limitation is that your system must have enough memory to contain your entire keyspace, with room for a few other operational components. However, unless you have an extremely large number of keys, Bitcask fits many datasets. Visit our documentation for more details on Bitcask, and use the Bitcask Capacity Calculator to assist you with sizing your cluster.
LevelDB is an open-source, on-disk key-value store from Google. Basho maintains a version of LevelDB tuned specifically for Riak. LevelDB doesn’t have Bitcask’s memory constraints around keyspace size, and thus is ideal for deployments with a very large number of keys. In addition to this advantage, LevelDB uses Google Snappy data compression, which provides particular efficiency for text data like raw text, Base64, JSON, HTML, etc. To use LevelDB with Riak, you must the change the storage backend variable in the app.config file. You can find more details on LevelDB here.
Riak also offers a Memory storage backend that does not persist data and is used simply for testing or small amounts of transient state. You can also run multiple backends within a single Riak instance, which is useful if you want to use different backends for different Riak buckets or use a different storage configuration for some buckets. For in-depth information on Riak’s storage backends, see our documentation on choosing a backend.
How do I model data using Riak’s key/value design?
Riak uses a key/value design to store data. Key/value pairs comprise objects, which are stored in buckets. Buckets are flat namespaces with some configurable properties, such as the replication factor. One frequent question we get is how to build applications using the key/value scheme. The unique needs of your application should be taken into account when structuring it, but here are some common approaches to typical use cases. Note that Riak is content-agnostic, so values can be any content type.
|Session||User/Session ID||Session Data|
|Content||Title, Integer||Document, Image, Post, Video, Text, JSON/HTML, etc.|
|Advertising||Campaign ID||Ad Content|
|Sensor||Date, Date/Time||Sensor Updates|
|User Data||Login, Email, UUID||User Attributes|
For more comprehensive information on building applications with Riak’s key/value design, view the use cases section of our documentation.
What other options, besides strict key/value access, are there for querying Riak?
Most operations done with Riak will be reading and writing key/value pairs to Riak. However, Riak exposes several other features for searching and accessing data: MapReduce, full-text search, and secondary indexing.
Riak also provides Riak Search, a full-text search engine that indexes documents on write and provides an easy, robust query language and SOLR-like API. Riak Search is ideal for indexing content like posts, user bios, articles, and other documents, as well as indexing JSON data. For more information, see the documentation on Riak Search.
Secondary indexing allows you to tag objects in Riak with one or more queryable values. These “tags” can then be queried by exact or range value for integers and strings. Secondary indexing is great for simple tagging and searching Riak objects for additional attributes. Check out more details here.
How does Riak differ from other databases?
We often get asked how Riak is different from other databases and other technologies. While an in-depth analysis is outside the scope of this post, the below should point you in the right direction.
Riak is often used by applications and companies with a primary background in relational databases, such as MySQL. Most people who move from a relational database to Riak cite a few reasons. For one, Riak’s masterless, fault-tolerant, read/write available design make it a better fit for data that must be highly available and resilient to failure scenarios. Second, Riak’s operational profile and use of consistent hashing means data is automatically redistributed as you add machines, avoiding hot spots in the database and manual resharding efforts. Riak is also chosen over relational databases for the multi-datacenter capabilities provided in Riak Enterprise. A more detailed look at the difference between Riak and traditional databases and how to make the switch can be found in this whitepaper, From Relational to Riak.
A more detailed look at the technical differences between Riak and other NoSQL databases can be found in the comparisons section of our documentation, which covers databases such as MongoDB, Couchbase, Neo4j, Cassandra, and others.
This is a cross post from compositecode.com written by Adron Hall, one of the Basho Technical Evangelists. In it he walks through one of the methods of setting up and configuring a cluster on AWS. Other options are enumerated in a post entitled Riak on AWS – Deployment Options
March 14, 2013
I wanted to write up an intro to getting Riak installed on AWS, even though the steps are absurdly simple and already available on the Basho Docs site, there are a few extra notes that can be very helpful for a few specific points during the process.
Start off by logging into AWS. At this point you can take two different paths that are almost identical. You can follow the path of using the pre-built AWS Marketplace image of Riak, or just start form scratch. The difference is a total of about 2 steps: installing & setting some security port connections. I’m going to step through without using the prebuilt image in these instructions.
First thing you’ll need to get a security group with the correct permissions setup. For that, you’ll need to make a security group.
NOTE: No, I didn’t mean to misspell Riak, but it’s in there now.
Before adding the ports, go to the security group details tab and copy the security group id. I’ve pointed it out in the image above.
Now add the following three and assign the security group to the ports; 4369, 8099 & 6000-7999. For the source set it to the security group id. Once you get all three added the list should look like this (below). For each rule click the Add Rule button and remember to click the Apply Rule Changes. I often forget this because the screen on some of the machines I use only shows to the bottom of the Add Rule button, so you’ll have to scroll down to find the Apply Rule Changes button.
Now add the standard port 22 for SSH. Next get the final two of 8087 and 8098 setup and we’re ready for moving on to creating the virtual machines.
Server Virtual Machines
For creating virtual machines I just clicked on Launch Instance and used the classic wizard. From there you get a selection of items. I’ve used the AWS image to do this, but would actually suggest using a CentOS image of your choice or Red Hat Enterprise Linux (RHEL). Another great option is to use the Ubuntu 12.04 LTS. Really though, use whatever Linux version or distro you like, there are 1-2 step instructions for installing Riak on almost every distro out there.
Next just launch a single instance. We’ll be able to launch duplicates of these further along in the process. I’ve selected a “Micro” here but I’m not intending to do anything with a remotely heavy load right now. At some point, I’ll upgrade this cluster to larger instances when I start putting it under a real load. I’ll have another blog entry to describe exactly how I do this too.
Continue again until you can select the security group that we created above.
Now keep hitting that continue button, until you get to launch, and launch this thing. Once the instance is launched launch your preferred SSH connection tooling. The easiest way I’ve found for getting the most current private IP to connect to with the appropriate command is to right click on the instance in the AWS Console and click on Connect. There you’ll find the command to connect via SSH.
Paste that in and hit enter in your SSH App, you’ll see something akin to this.
$ cd Codez/working-content/
$ ssh -i riaktionz.pem email@example.com
The authenticity of host 'ec2-54-245-201-97.us-west-2.compute.amazonaws.com (18.104.22.168)' can't be established.
RSA key fingerprint is 31:18:ac:1a:ac:fc:6e:6d:55:e8:8a:83:9a:8f:c7:5f.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'ec2-54-245-201-97.us-west-2.compute.amazonaws.com,22.214.171.124' (RSA) to the list of known hosts.
Please login as the user "ubuntu" rather than the user "root".
Enter yes to continue connecting. For some instance types, like Ubuntu you’ll have to do some teaks to log into as “ubuntu” vs. “root” and the same goes for the AWS image or others. I’ll leave that to you, dear reader to get connected via ole’ SSH.
One of the other things, that you may have to do some tweaking about and googling, is figuring out the firewall setups on the various virtual machine images. For the RHEL you’ll want to turn off the firewall or open up the specific connection ports and such. Since the AWS firewall does this, it isn’t particularly important for the OS to continue running its firewall service. In this case, I’ve turned off the OS firewall and just rely on the AWS firewall. To turn off the RHEL firewall, execute the following commands.
$ service iptables save
$ service iptables stop
$ chkconfig iptables off
Now is a perfect time to start those other instances. Navigate into the AWS Console again and right click on the virtual machine instance you’ve created. On that menu select Launch More Like This.
Go through and check the configuration on each of these, make sure the firewall is turned off, etc. Then move on to the next step and install Riak and cluster them. So it’s time to get to the distributed, massively complex, extensive list of steps to install & cluster Riak. Ok, so that’s sarcasm.
Step 1: Install Riak
Install Riak on each of the instances.
wget http://yum.basho.com/gpg/$package -O /tmp/$package &&
sudo rpm -ivh /tmp/$package
sudo yum install riak
NOTE: For other installation methods, such as directly downloading the RPM or other Linux OSes, check out the http://docs.basho.com/riak/latest/tutorials/installation/Installing-on-RHEL-and-CentOS/.
Step 2: Setup the Cluster
On the first instance, get the IP. You won’t need to do anything to this instance, just keep the IP handy. Then move on to the second instance and run the cluster command.
sudo riak-admin cluster join riak@
Do this on each of the instances you’ve added, using that first node. When you’ve added them all, on that last instance (or really any of them) then run the plan. This will get you a display plan of what will take place when the cluster is committed.
sudo riak-admin cluster plan
If that looks all cool. Commit the plan.
sudo riak-admin cluster commit
Get a check of the cluster.
sudo riak-admin member_status
That’s it; all done. You now have a Riak Cluster. For more operations to try out on your cluster, check out this list of basic API Operations.
January 15, 2013
Today we’re introducing an easier way to build Riak clusters on AWS using CloudFormation.
The project, cloudformation-riak, comes with three CloudFormation templates. These templates range from building a simple Riak cluster to building a VPC-based stack that includes: a front-end load balancer; a cluster of application servers with a Riak powered demo application; a backend load balancer; and a riak-cluster.
Head over to the cloudformation-riak repo to get started. We also put together a screencast (below) that shows things in action.
Running Riak on AWS just got easier. Announcing Riak AMI, a ready-built virtual machine and configuration of Riak for Amazon EC2.
CAMBRIDGE, MA – December 14, 2012 – A number of our community members and customers already use Riak on AWS, and with the Riak AMI getting up and running should be much easier. The Riak AMI helps support a growing number of hybrid implementations where businesses use both private infrastructure and public cloud services. This hybrid model can be leveraged to address burst capacity issues, tenancy/locality concerns, and simple proof-of-concept deployments, in addition to a myriad of other business challenges.
For more information, read our post here.