Disk-less databases?

Disk-less databases?

Being a company with an alternative (one of the first “NOSQL”) database, and a long history of solving problems that relational databases cannot, we often see system and application architects resorting to memory as a last ditch attempt to squeeze just a bit more performance out of RDBMS that simply were not designed to perform at scale. And so it was with some interest, and a little bias, I read a new blog post over on ODBMS.org this week titled “The future of data management: ‘Disk-less’ databases? An interview with Goetz Graefe” and decided to share some additional thoughts.

Goetz believes that “with no disks and thus no seek delays, assembly of complex objects will have different performance tradeoffs”. He thinks “a lot of options in physical database design will change, from indexing to compression and clustering and replication.”

It’s a valid assessment, but one which begs a larger discussion.

Turns out, this discussion has been going on for 20 years, starting with Times Ten. Perst and ExtremeDB use RAM to speed things up as well. We’ve shown that running with SSDs can give up to 80x speed increase on reads and 4x on writes. Developers can also configure Objectivity/DB, our flagship data management product, for purely cached applications, e.g. in the telecom and process control worlds, where there’s a lot of lookup data. You’ll find much of this covered in one of our older white papers titled “Flexible Deployment” available in both web/HTML and PDF formats on our site.

In today’s world of big data, it’s easy to build something that becomes I/O bound – or hitting the speed limit that any disk has in reading and writing information. Goetz is one of the most brilliant data management experts on the planet, but I think this interview neglects the business-end of the equation (memory is expensive). It would be better to make clear, if you’re worried about becoming I/O bound as your system grows, you actually have *two* choices:

  1. You can move part or all of your application data into memory via memcache or ramdisk components, or buy a super beefy machine with terabytes of memory.

    This could improve your application performance from some percentage, up to several multiples. This will, however, add complexity to your application, require management of potential numerous new component layers, and force you to make decisions between hot, warm and cold data (because most companies can only afford to move hot data into RAM while leaving everything else on cheaper disks). And, complexity adds cost. $500K or more for each super beefy multi-terabyte RAM machine, plus the added cost of engineering, maintenance and IT management, can all add up pretty quickly. You might get your web-facing system to run %30 faster. Is it worth the price you paid? Did you get a return on that investment?

    But there is another option…

  2. Distribute your data and processing.

    Depending on the data store you use, each machine in your cluster (including significantly cheaper, commodity hardware) could be used to reduce the problem into little pieces that are much more quickly processed, or in some cases, the database can leverage the processing power of each machine to actually give you a near linear performance increase as machines are added. I know… many of you who are dealing with sharded databases are seeing significantly reduced performance as your joins increase. But what if (and this is kind of the whole point of “Not Only SQL” or “No SQL” data technologies) you could eliminate joins? What if, by just switching your data model and programming paradigms a bit, you could access your data anywhere it lived, in milliseconds or less?

    You already most likely use an object oriented programming language (C# or Java), but also most likely find yourselves needing to normalize and map objects into a relational scheme or rows and columns. The nice thing about the technological landscape today is, if your data really doesn’t need to live in rows and columns, you don’t have to force it.

Welcome to the New World.

The general consensus and opinion is: Memory is expensive. Disks are cheap. Developers are resorting to memory to overcome several performance and other bottlenecks inherent with older and/or relational technologies. We often see memory being used as a short term band-aid or treatment of symptoms that actually don’t address the underlying disease.

No, relational databases aren’t a disease. Please don’t flame me. They do many things better than any other data technology. But they don’t do everything. The “one size fits all” approach is dead. If you need performance at scale, but don’t need to force all your data into rows and columns, and you also don’t see any return on investment in expensive memory solutions, then perhaps it is time to look at one of these new “NOSQL” products. It doesn’t take much effort to build a proof of concept, and see which problems you can solve with one of these products. Sure, you might need to take a polyglot application approach (which may also include some complexity issues), but in most cases I believe you’ll find you can achieve results that give you more freedom, fewer sleepless nights, and a system that just works.

If you need fast lookups of values, you can use a key-value store like Citrusleaf or Riak. If you’re dealing with related collections of objects that resemble a “document” then you can download Mongo, BigCouch or OrientDB. Need to walk a complex graph, where objects and connections (nodes and edges) can answer some deep social network analysis questions? Then get a graph database (we recommend InfiniteGraph of course) that treats edges as first class citizens and can traverse those connections thousands of times faster than a recursive join SQL query.

This is the space where we play.

<– START shameless self-promotion –>

Objectivity/DB is the original distributed, massively scalable data management and object persistence product that we have sold into leading government and enterprise systems for roughly the past 20 years (we’re on version 10 of the product now).

Last year, we developed InfiniteGraph, an API above our distributed core which allows developers to easily handle their graph data problems without having to learn thousands of methods and all the bells and whistles of our core database. Just download InfiniteGraph (we offer a completely free version), install it, grab some sample code from our Developer Wiki to help start your project, and violá!

We think InfiniteGraph can solve your relationship analytics, intelligence and social network analysis problems better than anything else out there. InfiniteGraph uses memory and cache to help you get the best performance on your live data, while also persisting relationship information to disk so you never have to worry about losing it all in a *flash* (pun intended).

If you need more of the data management functionality of Objectivity/DB, you can do that too. We’ll consult and train you, and ensure you can make use of all the best practices we have learned and applied to mission-critical government, security and intelligence, commercial, telecom, science and enterprise applications we have supported over the years.

<– END shameless self-promotion –>

On a related note with InfiniteGraph : We’re seeing complete applications built on InfiniteGraph, fully tested and deployed in a few weeks on average. It wasn’t too long ago that it took many months, to a year or more just to build anything interesting –- and even then, the slightest breeze or sudden spike in traffic (which was nowhere near today’s “Digg Effect”) could send the whole thing crashing, sending every IT person in the building and remote, scrambling from their dimly lit offices and Quake games to see what was the matter. And after each crisis was resolved (often temporarily), organizations found ourselves wondering (again) how they could economically exploit RAM and other off-disk schemes to speed response from over-burdened relational databases.

It’s good to see this discussion continues… Using memory as a band-aid, given the cost and added complexity, is not a real solution. But as alternative data technologies continue to mature and become more mainstream, at the same time new and cost efficient memory are being produced, I believe the bandaids will give way to truly amazing and blazingly fast systems that solve all our problems… until, that is, the continued exponential growth in data once again overloads those solutions, giving us a whole new set of problems to solve (again).

Thomas KrafftThomas Krafft is the Director of Marketing at Objectivity, Inc. (the company behind Objectivity/DB and InfiniteGraph). He oversees all marketing efforts, including communications and PR, demand generation and content development. Having joined the company in 2008, Thomas brings a diverse experience from more than 15 years working with Fortune companies including Intuit and Veritas, successful startup ventures (including one acquired by Barnes & Noble), and hundreds of clients to which he provided marketing and internet consulting for several years. Thomas holds a B.A. in Political Science, International Relations, from California Polytechnic State University, San Luis Obispo.

 

NOSQL Now! Presentation, August 24, 2011: Graph Databases: Connecting the Dots in Big Data

If your data contains a lot of many-to-many relationships, if recursive self-joins are too costly or limiting to your application and scaling needs, and/or your primary objective is quickly finding connections, patterns and relationships between the objects within lots of data, you will find graph databases superior to all other technologies – including relational databases, key-value, column or document databases.

Darren Wood is the Architect and Lead Developer of InfiniteGraph, the distributed graph database, produced by Objectivity, Inc. Darren has spent the majority of his career architecting and building distributed systems with an emphasis on elastic scalability and data management. Prior to joining Objectivity, Inc. in 2007, Darren held positions as a Senior Consultant with IONA Technologies and a Development Team Lead for Citect Australia. Darren holds a First Class Honors Degree in Computer Systems Engineering from the University of Technology in Sydney, Australia.

Video:

[slideshare id=10609287&doc=infinitegraphh264-111216015456-phpapp01-video]

Slides:

[slideshare id=9074323&doc=nosqlnowfulltalkdwood-110830134717-phpapp01]

Click here to access the slides on SlideShare directly.

 

Slides from our last Meetup: Introducing InfiniteGraph. Connecting the Dots in Big Data.

On August 17, 2011, the InfiniteGraph team hosted a local Meetup attended by dozens of senior developers working on large scale enterprise and startup projects. Big Data problems are quickly presenting themselves in almost every area of computing from Social Network Analysis to File Processing.

Many technologies, such as those in the NoSQL space were developed in response to the limitations of current storage systems as an effective mechanism to deal with these mountains of data. And much of that data is interconnected in ways that, when organized properly, gives interesting and often valuable information. InfiniteGraph was designed specifically to traverse complex relationships in big data, and provide the framework for products built to provide real-time network analysis, business decision support and relationship analytics.

Presenters: Darren Wood, Chief Architect, InfiniteGraph. Mark Maagdenberg, Senior Field Engineer, InfiniteGraph.

Here are the slides (click here to access the slides on SlideShare directly)

[slideshare 9074310]

InfiniteGraph to Present at the NoSQL NOW! Conference, San Jose, CA. August 23-25, 2011

Visit InfiniteGraph Booth #6 at NoSQL NOW! For Information About Us and Our Latest Developer Contest Offering a Grand Prize of $12,000 in Apple Products!

For Immediate Release:

Sunnyvale, CA – August 18, 2011— InfiniteGraph, the distributed and scalable graph database, presents an educational overview of graph technology at the NoSQL NOW! conference in San Jose, CA, August 23-25, 2011. Join us to learn about the latest trends in graph database technology. InfiniteGraph will be exhibiting in booth #6 and Darren Wood, Architect and Lead Developer of InfiniteGraph will be presenting in two separate sessions:

NOSQL Now! is a vendor-neutral forum celebrating the diversity of NoSQL technologies and helping businesses develop objective evaluation processes to match the right NoSQL solutions with the right business challenge.
http://nosql2011.wilshireconferences.com

InfiniteGraph is also challenging developers to create next-gen social and information network analysis applications. Take a shot at developing something unique, cool, new and exciting on InfiniteGraph, and you could win $12,000 in Apple computer and entertainment products! Visit the InfiniteGraph booth #6 for more details or register online to enter the contest and to access the full version of InfiniteGraph.

About InfiniteGraph
InfiniteGraph, a product of Objectivity, is a unique distributed scalable, NOSQL graph database enabling large-scale, fast graph processing, data analytics and discovery of information around mission critical enterprise requirements. Organizations use InfiniteGraph to discover complex relationships in data and develop applications with significant time-to-market advantages, technical cost savings and achieve greater return on data related investments by connecting the dots on a global scale.

About Objectivity, Inc.
Objectivity, Inc. simplifies complex data enabling organizations to discover hidden relationships and develop applications with significant time-to-market advantages and technical cost savings, achieving greater return on data related investments. Objectivity/DB, provides distributed and scalable data and object management.

Objectivity, Inc. is committed to their customers’ success. The company has offices and representatives worldwide, and works directly with organizations, integrators and technical teams to recommend solutions and support options specifically tailored to each customer’s project and technical requirements. Please contact Objectivity, Inc. online or call (408) 992-7100 for more information.

# # #

Note to editors: Objectivity, Objectivity, Inc., Objectivity/DB and InfiniteGraph are trademarks of Objectivity, Inc. All other company, organization, product or alliance names mentioned herein remain the property of their respective owners.