Tag Archives: Basho

Choosing the Right Tool

August 10, 2012

We have a poorly defined term in our industry: “NoSQL.” [Does your toaster run SQL? No? Then you own a NoSQL toaster.] Be that as it may, Riak falls under the umbrella of software that carries this label. In our attempt to own the label, we reinterpret it to mean that we now have more choices as developers. For too long, our only meaningful options for data storage were SQL relational databases and the file system.

In the past few years, that has changed. We now have many production-ready tools available for storing and retrieving data, and many of those fall within the sphere of NoSQL. With all of these new options, how do we as developers choose which database to use?

On the Professional Services Team, this is the first question we ask ourselves: What is the best storage option for this application? At Basho, Professional Services goes on-site to assist clients with training, application development, operational planning – anything to help get the most out of Riak. In order to know how to do that, we also have to know quite a bit about other NoSQL databases and storage options, and when it might be a better option to go with something other than Riak. Below we outline some of our reasoning when we evaluate Riak for our clients and their applications.

A Simple Key-Value Store

When our clients simply need a key-value store, our job as consultants couldn’t get any easier. Riak is a great key-value database with an excellent performance profile, fantastic high availability and scaling properties, and the best deployment/operations story that we know. We are very proud of our place in the industry when it comes to these features.

But when the business logic for the application requires an access pattern more sophisticated than a simple key lookup, we have to dig deeper to figure out whether Riak is the right tool for the job. We have evolved the following distinguishing criteria:

If there is a usage scenario requiring ad-hoc, dynamic querying, then we might consider alternative solutions.

  • Ad-hoc: by this we mean that queries run at unpredictable times, possibly triggered by end-users of the application.
  • Dynamic: by this we mean that queries are constructed at the time they are being run.

If the usage scenario requires neither ad-hoc nor dynamic queries, then we can usually construct the application in such a way that even complex analysis works well with Riak’s key-value nature. If the scenario requires ad-hoc but not dynamic queries, then we have look at options to tune performance of the known access patterns. If the scenario requires dynamic queries run on a regular basis, then we might investigate running the dynamic queries on an ‘offline’ cluster replica so that we don’t interfere with the availability of the ‘online’ production clusters.

These criteria began to take form in our evaluations of Riak for data analytics. We often see Riak deployed as a Big Data solution because of its exceptional fault-tolerance and scaling properties, and running analytics on Big Data is a common use case. MapReduce gives us the ability to run sophisticated analytics on Riak, but other solutions exist that are optimized for analytics in ways that Riak is not. It is generally not a good idea to run MapReduce on a production Riak cluster for data analysis purposes. MapReduce exists in Riak primarily for data maintenance, data migrations, or offline analysis of a cluster replicate. All three of these are good use cases for Riak’s MapReduce implementation.[1]

Key-Value State of Mind

Does that mean that data analysis applications are off the table? Absolutely not! In our training sessions and workshops, we emphasize that key-value databases requite a different mindset than relational databases when you are planning your application.

In traditional SQL applications, we as engineers start defining the data model, normalizing the data, and structuring models in such a way that relations can be fetched efficiently with appropriate indexing. If we do a good job modeling the data, then we can proceed with reasonable certainty that the application built on top if it will unfold naturally. The developers of the application layer will take advantage of well-known patterns and practices to construct their queries and get what they want out of the data model. It’s no surprise that SQL is pretty good for this kind of thing.

In a key-value store, we approach the software architecture from the opposite side and proceed in the other direction. Instead of asking what the data model should look like and working up to the application view, we begin by asking what the resulting view will look like and then work ‘backwards’ to define the data model. We start with the question: What do you want the data to look like when you fetch it from the database?

If we can answer the above question, and if we can define the structure of the result that we want in advance, then we probably have a good case for pre-processing the results. We pre-process the data in the application layer before it enters Riak, and then we just save the answer that we want as the value of a new key-value pair. In these cases, we can often get better performance when fetching the result than a relational approach because we don’t have to perform
the computation of compiling and executing the SQL query.

A rolling average is a simple example: Imagine that we want to have the average of some value within data objects that get added to the system throughout the day. In a SQL database, we can just call average() on that column, and it will compute the answer at query time. In a key-value store, we can add logic in the application layer to catch the object before it enters Riak, fetch the average value and number of included elements from Riak, compute the new rolling average, and save that answer back in Riak. The logic in the application layer is now slightly more complicated, but we weigh this trade-off against the simplicity of administering the key-value database instead of a relational one. Now, when you go to fetch the average, it doesn’t have to compute it for you. It just returns the answer.

With the right approach, we can build applications in such a way that they work well with a key-value database and preserve the highly available, horizontally scaling, fault-tolerant, easy-as-pie administration that we have worked so hard to provide in Riak. We look forward to continuing to help you get the most out of Riak, and choosing the best tool for the job.

Casey

Related

See Sean’s excellent post on Schema Design in Riak

Footnotes

[1]: In some situations, using MapReduce to facilitate a bulk fetch provides better performance than requesting each object individually because of the connection overhead. If you go that route, be sure to use the native Erlang MapReduce functions like ‘reduce_identity’ already available in Riak. As always, test your solution before putting it into production.

Riak 1.2 Is Official!

August 7, 2012

Nearly three years ago to the day, from a set of green, worn couches in a modest office Cambridge, Massachusetts, the Basho team announced Riak to the world. To say we’ve come a long way from that first release would be an understatement, and today we’re pleased to announce the release and general availability of Riak 1.2.

Here’s the tl;dr on what’s new and improved since the Riak 1.1 release:

  • More efficiently add multiple Riak nodes to your cluster
  • Stage and review, then commit or abort cluster changes for easier operations; plus smoother handling of rolling upgrades
  • Better visibility into active handoffs
  • Repair Riak KV and Search partitions by attaching to the Riak Console and using a one-line command to recover from data corruption/loss
  • More performant stats for Riak; the addition of stats to Riak Search
  • 2i and Search usage thru the Protocol Buffers API
  • Official Support for Riak on FreeBSD
  • In Riak Enterprise: SSL encryption, better balancing and more granular control of replication across multiple data centers, NAT support

If that’s all you need to know, download the new release or read the official release notes.

More on What’s in Riak 1.2

New Approaches to Cluster Management

Stage and Review Cluster Changes: Stage a set of cluster changes, review how they will affect the cluster, and then commit or abort the changes. The new process means you can perform multiple changes at once – useful for adding/removing nodes – and “preview” operational changes to your cluster. Delay changes for off-peak hours or cancel changes you don’t like.

Rolling Upgrades, Made Easier: Previously, users would have to manually ensure newer features that might not be supported by all nodes were disabled before starting a rolling upgrade, then enable them after. Riak 1.2 has built-in capability negotiation that automatically determines features that are supported by all nodes. We’ll be sharing more details on the blog soon. For now, check out the cluster management docs.

Repair Search or KV Partitions Thru Riak Console

We’re introducing a new anti-entropy mechanism to let you rebuild a partition for KV or Search. Prior to Riak 1.2, users could repair KV and Search data by listing keys and using read repair. Now, you can attach to the Riak Console and use the new repair feature to rebuild partition data lost or corrupted in a failure scenario – a simpler and faster solution.

Improved Reporting of Handoff

Continuing with the theme of making dealing with your Riak cluster better, we’ve beefed up the ‘riak-admin transfers’ command to give you much more information on the status of active transfers. (More information on changes to the transfers command here.)

Stats, Stats, Stats: Performance Improvements; Riak Search Has Stats, Too

Riak now uses the open source Folsom library by our friends at Boundary. Folsom is an Erlang-based system that collects and reports real-time metrics. The main difference you’ll notice? Much better performance of stats. (In-depth blog post on 1.2 stats improvements here.)

Protobuf API Adds Support for Search and 2i

So you want to use Riak’s Search and Secondary Indexes features through the protobuf API? Wish granted. Check out Sean Cribbs’ post on 1.2 changes to the protobufs API here.

LevelDB Performance Improvements

LevelDB, one of the storage backends that Basho supports for Riak, now sees much more stable performance and better throughput under load. We’ll be releasing a blog post about this soon…

Incremental Improvements to MapReduce

Accumulation of results for MapReduce queries is now faster and more efficient. Bryan Fink shared some details about this work back in June.

FreeBSD Packaging, SmartOS Binaries, and More

We now have packaging for FreeBSD. Plus, you can now grab a binary package for SmartOS. Additionally, Ubuntu packages for 10.04 (Lucid), 11.04 (Natty), and 12.04 (Precise) are now provided separately. Jared Morrow wrote up details on this last month and a complete list of packages can be found here.

New Stuff for Multi Data-Center Replication in Riak Enterprise

Riak Enterprise includes technical support and multi-data center capabilities. In the 1.2 release, Riak Enterprise balances multi-datacenter replication workload across the cluster for better stability and performance. Plus, more granular replication options so you can do full-sync or real-time sync on particular buckets. We’ve also added SSL encryption for all network activity, plus NAT support.

Thank You To The Community

There’s a lot in this release. Both new and existing users should see be excited about these enhancements and fixes. As usual, the community is due a massive thanks from all of us at Basho. This code would be no where near as stable and robust were it not for you relentless usage, abuse, inspection of, and contributions to Riak at every level. Thank you. Please don’t stop. Things like this would not be possible were it not for you proving Riak’s worth.

So…

After you download Riak 1.2, join the mailing list and let us know what you think. Then go register for RICON. It’s a two day event happening in San Francisco this October, and it’s dedicated to all things Riak and distributed systems. You won’t be disappointed.

Thanks for being a part of Riak.

Basho Eng

Basho Comments on OpenStack Foundation Board Nomination

August 02, 2012

Recently, Shanley Kane, our director of product management, was nominated as an individual member candidate to the OpenStack Foundation Board of Directors. Following Shanley’s nomination, she decided to withdraw. There has been significant speculation regarding the events that led up to Shanley’s decision.

Shanley is currently working with OpenStack in its effort to review and resolve the situation. We have been encouraged by the way OpenStack has handled the matter, including their responsiveness from the very beginning. Basho supports OpenStack’s review process and cannot provide any additional details at this time.

We’re honored to have Shanley considered as an individual member of the OpenStack Board. We encourage our employees to be actively and personally involved in important industry-defining organizations. OpenStack and Basho share many mutual users. The community members of both OpenStack and Basho expect us to build great software, to interoperate as needed, and to advocate open source principles.

The Basho Team

Analyzing Customer Support at Basho

July 30, 2012

Basho’s vision is to “be the best organization in the world.” This vision applies to every aspect of Basho as a company, including the quality of our products, our culture and of course our customer support. We strive to provide the best customer support possible.

We have chosen Zendesk as our help desk platform given its ease of usability, customization, integration with other tools, and API. In our journey to dive into our deluge of now almost 2 years of accumulated Zendesk data, I first ventured over to Zendesk’s API page to check out the existing clients. Currently, only a Ruby client is listed but a Python library also exists.

As a data scientist desiring to do some complex analytics on our customer support data, I was stunned that no one had yet developed an R wrapper for Zendesk! Having used R since 2005 (mainly to analyze genetic and genomic data) as well as plenty of its libraries and the mailing list (often), I realized this was finally my chance to give back to the open source community, which is also very much in alignment with Basho’s commitment to open-source software.

So I wrote a Zendesk API wrapper in R called zendeskR (code on github).

As of this blog posting, zendeskR v0.2 supports 6 different API calls:

These calls access the data types that I was analyzing most often, but I will add more features, as well as update and maintain the package regularly. If there is an API call you would like to see supported, please feel free to shoot me an e-mail at the address noted in the package description.

The analytic possibilities are nearly endless by having all of this data in R. Some example questions that an organization could begin to answer with this data are:

    • What is the average time to close a ticket for each customer?
  • How many comments were posted to a ticket before it was finally resolved?
  • How much does it cost to support customer X based on support and engineering time?
  • Using sentiment analysis and a text corpus, what are the most frequently used words for tickets that receive a Good Satisfaction rating versus a Poor Satisfaction rating?
  • Based on past trends, how many tickets will open this month?
  • What is a developer advocate’s ticket closing rate and satisfaction rating?

Here’s a simple charting example that displays the number of tickets and users created by month.

zendeskR

To install zendeskR from CRAN, open an R console and type:

The analytical opportunities as described above are almost endless and at Basho we have only begun to scratch the surface. Ultimately, we aim to provide the highest quality products possible coupled with the best customer support system in the world, which is partially achieved by data-driven customer support system optimization strategies. Aim high.

Tanya

Analyzing Customer Support at Basho

July 30, 2012

Basho’s vision is to “be the best organization in the world.” This vision applies to every aspect of Basho as a company, including the quality of our products, our culture and of course our customer support. We strive to provide the best customer support possible.

We have chosen Zendesk as our help desk platform given its ease of usability, customization, integration with other tools, and API. In our journey to dive into our deluge of now almost 2 years of accumulated Zendesk data, I first ventured over to Zendesk’s API page to check out the existing clients. Currently, only a Ruby client is listed but a Python library also exists.

As a data scientist desiring to do some complex analytics on our customer support data, I was stunned that no one had yet developed an R wrapper for Zendesk! Having used R since 2005 (mainly to analyze genetic and genomic data) as well as plenty of its libraries and the mailing list (often), I realized this was finally my chance to give back to the open source community, which is also very much in alignment with Basho’s commitment to open-source software.

So I wrote a Zendesk API wrapper in R called zendeskR (code on github).

As of this blog posting, zendeskR v0.2 supports 6 different API calls:

These calls access the data types that I was analyzing most often, but I will add more features, as well as update and maintain the package regularly. If there is an API call you would like to see supported, please feel free to shoot me an e-mail at the address noted in the package description.

The analytic possibilities are nearly endless by having all of this data in R. Some example questions that an organization could begin to answer with this data are:

  • What is the average time to close a ticket for each customer?
  • How many comments were posted to a ticket before it was finally resolved?
  • How much does it cost to support customer X based on support and engineering time?
  • Using sentiment analysis and a text corpus, what are the most frequently used words for tickets that receive a Good Satisfaction rating versus a Poor Satisfaction rating?
  • Based on past trends, how many tickets will open this month?
  • What is a developer advocate’s ticket closing rate and satisfaction rating?

Here’s a simple charting example that displays the number of tickets and users created by month.

zendeskR

To install zendeskR from CRAN, open an R console and type:

The analytical opportunities as described above are almost endless and at Basho we have only begun to scratch the surface. Ultimately, we aim to provide the highest quality products possible coupled with the best customer support system in the world, which is partially achieved by data-driven customer support system optimization strategies. Aim high.

Tanya

Congratulations to Casey Rosenthal

July 26, 2012

We have some more good news to pass along: Casey Rosenthal, an engineer on Basho’s Professional Services team, has had a paper accepted for publication at the 2012 IEEE Conference on Computational Intelligence and Games.

The paper is called “Personality Profiles for Generating Believable Bot Behaviors” and the work behind it focuses on personality profiles and how they are used to develop parameterized bot behaviors. From the abstract:

While personality profiling models were originally designed as a descriptive tool for human behavior, here we use them as a generative tool, allowing a plurality of different behaviors to result from a single rule set. This paper describes our use of the Five-Factor Model of personality to develop a bot that plays Unreal Tournament 2004.

Casey spent several years developing the related algorithms for a business management simulation called SMRTS, which is used in the pharmaceutical industry and higher education. Professor Clare Bates Congdon with the Department of Computer Science, University of Southern Maine, was a co-author.

He’ll be attending the conference in September in Granada, Spain to present his research and the actual paper will be available online after the conference.

Congratulations to Casey!

The Basho Team

Basho Technologies and PalominoDB Partner to Offer Enhanced Support and Monitoring Services for Riak Installation and Management

PalominoDB announced a strategic partnership with Basho Technologies, creators of Riak, to provide enhanced operational support for customers managing Riak clusters.

LAS VEGAS, NV and CAMBRIDGE, MA – July 25, 2012PalominoDB, the boutique database operations and engineering consultancy, announced a strategic partnership with Basho Technologies, creators of Riak, the widely-used open-source, distributed database and Riak CS, multi-tenant, cloud storage software, in order to provide enhanced operational support for customers managing Riak clusters.

“Our partnership with Palomino extends Basho’s own support and services with 24×7 monitoring and access to an additional pool of highly-skilled, multi-faceted support resources that are designed to complement webops and devops organizations,” said Mark Phillips, Director of Community and Developer Evangelism at Basho Technologies. “We are pleased to be able to endorse the operational excellence Palomino offers customers in all areas of database support and management, and to extend that to our broader Riak community. Palomino is a great addition to the Riak ecosystem.”

“We’re excited about this partnership for a number of reasons,” explained Palomino owner and CEO Laine Campbell. “Our commitment to best-in-class technologies and championing open-source database solutions has led us to Riak again and again. Formalizing the partnership allows our team to access the extensive knowledge and resources of the Basho team to bolster our own expertise, and to collaborate closely with Basho as they continue to innovate and grow.”

About PalominoDB

For startups and established companies of all sizes, Palomino provides ongoing operational support and professional expertise in database architecture, performance and scale. With a focus on open-source and other best-in-class software components, and extensive experience in all major and emerging database technologies, Palomino engages with customers to develop custom, cost-effective projects and long-term support contracts in areas from system design to automation to business intelligence and more. Palomino is renowned for an emphasis on transparency, communication and responsiveness, as well as providing operational excellence for leading companies including Zappos, Chegg, Technorati, Slideshare, SendGrid and Zendesk. For more information, please visit www.palominodb.com.

About Basho Technologies

Basho Technologies is the leader in highly-available, distributed database technologies used to power scalable, data-intensive Web, mobile, and e-commerce applications and large cloud computing platforms. Basho customers, including fast-growing Web businesses and large Fortune 500 enterprises, use Riak to implement content delivery platforms and global session stores, to aggregate large amounts of data for logging, search, and analytics, to manage, store and stream unstructured data, and to build scalable cloud computing platforms.

Riak is available open source for download at http://wiki.basho.com/Riak.html. Riak EnterpriseDS is available with advanced replication, services and 24/7 support. Riak CS enables multi-tenant object storage with advanced reporting and an Amazon S3 compatible API. For more information visit www.basho.com or follow us on Twitter at www.twitter.com/basho.

Media Contact
Basho
Jena Rossi
(617) 779-1878
basho@shiftcomm.com

Co-Author of "Seven Databases" Book Joins Basho Team

July 18, 2012

We’re thrilled to announce that Eric Redmond, co-author of Seven Databases in Seven Weeks, has just joined the engineering team here at Basho. We’ve been big fans of Eric’s work for a while, and we couldn’t be happier that he’s decided to come work full-time on getting Riak into the hearts and minds of developers everywhere.

In case you haven’t read it, “Seven Databases” provides an overview of a variety of open source databases available today. In addition to covering Riak, it also touches on Redis, Neo4J, CouchDB, MongoDB, HBase, and PostgreSQL. It’s available on Amazon.com. The latest version was updated as recently as this past May.

Eric goes by coderoshi on both Twitter and GitHub. Join us in welcoming him to the team and look for his commits flying into Basho’s repositories.

The Basho Team

Yahoo! JAPAN Subsidiary IDC Frontier Makes Strategic Financial Investment in Basho Technologies and Commits to Deploy Basho’s Riak Software in Its Cloud Platform


IDC Frontier Invests $6.1 Million of Equity Capital in Basho

(Click here to view this press release in Japanese)

CAMBRIDGE, MA – July 17, 2012Basho Technologies announced today that it has entered into a strategic relationship with Yahoo Japan Corporation (“Yahoo! JAPAN”) subsidiary IDC Frontier, under which IDC Frontier is making a $6.1 million strategic equity investment in Basho and has committed to deploy Basho’s Riak distributed database and cloud storage technology within its cloud computing platform. The strategic relationship with IDC Frontier will accelerate the availability of Basho’s Riak technology throughout the Asia Pacific region.

Basho is a leading provider of distributed database and cloud storage technology. Yahoo! JAPAN subsidiary IDC Frontier is a leading data center provider in Japan.

Riak is a highly-available, distributed database that has witnessed rapidly growing adoption worldwide since the company released v1.0 in September 2011. In March 2012, the company released Riak CS, providing a multi-tenant capability and an Amazon S3 compatible API, which offers businesses a comprehensive public and private cloud storage platform. As one of Japan’s largest operators of data center infrastructure, IDC Frontier will utilize Riak CS to provide its customers a distributed cloud storage offering that takes advantage of the company’s large footprint of data center locations and enhances its cloud services platform.

“This strategic relationship with IDC Frontier and its investment in Basho underscores the growth of the distributed data market and Basho’s differentiated ability to capitalize on this market. Basho is excited to have IDC Frontier both as a customer and a strategic partner to accelerate our expansion throughout Asia”, said Donald J. Rippert, Basho’s chief executive officer. “IDC Frontier’s market leadership position and high quality infrastructure will be complimented perfectly by Basho’s technology, which emphasizes availability, scalability and high-performance at its core. The market for cloud computing is rapidly accelerating. We are exclusively focused on being the distributed systems leader that can best unlock the potential of the cloud for our partners and our customers.”

“Through this strategic relationship with Basho, IDC Frontier will greatly benefit from the addition of advanced storage capabilities within our cloud infrastructure,” said Kiichi Yamato, Vice-Division Director of Business Development Division at IDC Frontier. “We have made a significant evaluation of Riak, and we are highly impressed with its inherent distributed design, built-in redundancy and linear scalability. We plan to embed Riak in our cloud platform. Furthermore, we are excited about working with Basho to accelerate the adoption of Riak throughout Japan. We look forward to a long-term partnership with Basho.”

“This strategic investment by IDC Frontier follows $5 million of additional equity capital raised last month as part the company’s Series F round led by Georgetown Partners,” noted Rippert. “Through these investments in Basho totaling $11.5 million and the strength of our business already realized to date in 2012, I am confident in Basho’s market position and its ability to lead the rapidly emerging market for distributed data technologies.”

About IDC Frontier
IDC Frontier Inc. provides data center solutions. It offers IaaS (Infrastructure as a Service); co-location services, which include hosting and other data center solutions; hosting services, which include data center solutions to support e-business; security services; IP address/domain name registration services; network services; and service level agreement services. IDC Frontier Inc. is based in Tokyo, Japan, and as of February 2nd, 2009, operates as a subsidiary of Yahoo! Japan Corporation.
 http://www.idcf.jp/

About Basho Technologies
Basho Technologies is the leader in highly-available, distributed database technologies used to power scalable, data-intensive Web, mobile, and e-commerce applications and large cloud computing platforms. Basho customers, including fast-growing Web businesses and large Fortune 500 enterprises, use Riak to implement content delivery platforms and global session stores, to aggregate large amounts of data for logging, search, and analytics, to manage, store and stream unstructured data, and to build scalable cloud computing platforms.

Riak is available open source for download at http://wiki.basho.com/Riak.html. Riak EnterpriseDS is available with advanced replication, services and 24/7 support. Riak CS enables mutli-tenant object storage with advanced reporting and an Amazon S3 compatible API. For more information visit www.basho.com or follow us on Twitter at www.twitter.com/basho.

Contacts
Robert Siegfried / Lyndsey Estin
Kekst and Company
(212) 521-4800

Additional Questions?
For additional questions regarding Basho Technologies, IDC Frontier, and Basho products and services, please read the Basho Technologies, IDC Frontier and Yahoo Japan Strategic Partnership FAQ.

Neil Conway on Bloom and CALM

July 16, 2012

At long last, here’s the second of two talks from last month’s BashoChats meetup. (The first talk, “Escape Hatches in Go” from Jeff Hodges, is already available.)

Neil Conway is a CS graduate student from Berkeley focusing on data management and distributed systems. He and his colleagues have been hard at work on some powerful research that’s already having some impact on those building and running systems in production.

Bloom is a language purpose-built for distributed systems. CALM, which stands for “Consistency As Logical Monotonicity”, is a technique for proving out code to run in an eventually-consistent environment. Neil gives an overview of these, and then discusses recent work on extending Bloom to support a broader range of programs, with a focus on something that’s become increasingly-interesting to us at Basho called CRDTs.

This should be required watching for anyone who cares about the future of efficient distributed programming: aside from the fascinating work he and his colleagues are doing, Neil is an excellent presenter and does a great job of distilling abstract computing concepts into something developers might actually be able to apply in production.

Enjoy.

Mark