Overview

As graph databases become more widely adopted it is inevitable that other databases add some kind of graph capability to their APIs. In this article I explain why using that approach is never going to produce a system that performs as well as a true graph database, such as the one within Objectivity’s ThingSpan. I’ll explain the main requirements then look at the number of logical and physical operations needed to perform a simple navigation query using relational, NoSQL and graph database technologies.

 

Graph Database Requirements

Almost every graph query starts with a single node (a Vertex) and then navigates through relationship objects called Edges to a connected Vertex. This process is repeated until the tree or graph of objects has been traversed. There is also a more complex kind of query termed pathfinding, which finds the shortest or all paths between two or more objects.

All current databases use combinations of three basic mechanisms:

  • Scanning
  • Link traversal
  • Lookups speeded up by hash keys or indices [The indices consist of linked entries.]

 

The graph queries described above start by using a key or index to find the origin vertex(es) then use link traversal to navigate the graph. Any DBMS can perform these operations, but as the majority of the query is taken up with link traversal, the inefficiencies of this underlying mechanism dominates performance numbers.

 

Building a Graph Layer on a RDBMS

Any RDBMS is going to be reasonably fast at performing the initial lookup(s) to find the origin Vertex(es), or, more likely, the correct row of the join table. Traversing to the N connected vertices starts with N Join Table B-Tree Index entry lookups and N logical row read operations. Each Join Table read is followed by a Customer B-Tree Index lookup and a Customer Table row read.  So, each traversal operation requires 2 * N B-Tree search operations and 2 * N logical row reads.

 

 

TECHNOLOGY KEY LOOKUPS INDEX LOOKUPS LOGICAL READS
RDBMS 2 * N 2 * N

 

 

Building a Graph Layer on a Key-Value Store

As the name indicates, a key-value store is built to access objects using a key. So, a simple navigational query will start quickly with a single hash-based lookup, which will outperform an index lookup. However, from that point on, each traversal needs to use the key of each connected object to look it up. So, if there are N directly connected objects, each navigational operation requires N key-based lookups and N logical reads to access the connected object.

 

So far, the scores are:

TECHNOLOGY KEY LOOKUPS INDEX LOOKUPS LOGICAL READS
RDBMS 2 * N 2 * N
Key-Value Store N N

 

 

Building a Graph Layer on a Column-Store Database

A column-store is a relational database that stores each field in a row in a compact, indexed or temporal structure. This makes it very fast at finding all rows that have a field with a value in a particular range, but it has to do extra logical read operations to find the other fields when the qualified objects are found. So, its performance is very similar to that of a row-based relational database, i.e. if there are N directly connected objects, each navigational operation requires 2*N row lookups, N B-Tree search operations and F row lookups to acquire the other fields.

 

TECHNOLOGY KEY LOOKUPS INDEX LOOKUPS LOGICAL READS
RDBMS 2 * N 2 * N
Key-Value Store N N
Column Store N  2 * N + F

 

 

Building a Graph Layer on a Document Database

Document databases are optimized to store text and multimedia, such as images, videos and audio clips. They also have a mechanism to handle references to other documents, though this may be as primitive as looking up the other document by its title. Although it is technically handling graph structures, it isn’t optimized for that capability. At best, its navigational performance will equal that of a Key-Value Store.

 

TECHNOLOGY KEY LOOKUPS INDEX LOOKUPS LOGICAL READS
RDBMS 2 * N 2 * N
Key-Value Store N N
Column Store N 2 * N + F
Document Store N N

 

 

Graph Database Mechanisms

Graph databases are optimized to store and traverse relationships. There are two leading mechanisms: linked list traversal and Object Identifier (OID) based traversal. Linked list traversal involves the creation of a list that connects objects using a logical or physical key. The list may be embedded in the object, which was the mechanism used by hierarchical and network databases, or be a separate compact structure.

 

If the list uses logical keys and there are N directly connected objects, each navigational operation will require N key lookups and N logical reads. If it uses physical keys, it will require N physical reads. A logical read generally takes longer than a physical read because it has to perform some other operation to find the physical “key” before performing the physical read.

 

TECHNOLOGY KEY LOOKUPS INDEX LOOKUPS LOGICAL READS PHYSICAL READS
RDBMS 2 * N 2 * N
Key-Value Store N N
Column Store N 2 * N + F
Document Store N N
Graph Database (Logical keys) N N
Graph Database (Physical keys) N

 

OID-based traversal is very similar to list-based traversal, but the ID is either a physical key or something very close to one. In the case of ThingSpan the OID is a hierarchical structure that can be regarded as a key-based lookup at some levels and a physical lookup at others. Each object has an associated variable length array containing the OIDs of the objects that it is connected to. Accessing each object may involve a key lookup before the physical read, so if there are N directly connected objects, each navigational operation will require between zero and N key lookups and N physical reads, typically about N/50. Let’s order the table based on the overall performance, fastest first:

 

RANK TECHNOLOGY KEY LOOKUPS INDEX LOOKUPS LOGICAL READS PHYSICAL READS
1 Graph Database (Physical keys) N
2 Graph Database (OID-based) 0 to N (approx.

N/50)

N
3 Graph Database (Logical keys) N N
3 Key-Value Store N N
3 Document Store[1] N N
4 RDBMS 2 * N 2 * N
5 Column Store N 2 * N + F

 

The clear winner is the Graph Database using physical keys. However, constructing such a key that will operate at scale is impracticable as it will be limited by the mechanics of the storage devices. The storage infrastructure has to be able to supply a unique address within a linear storage space that needs to span multiple petabytes or exabytes. Also, if the object has to be moved after being resized, its physical key changes.

 

The closest to it in terms of performance is the OID-based Graph Database and the only commercially available one today is ThingSpan. Its OID-based structures allow vertex and edge objects to be distributed across multiple storage devices on a network, but still appear to be a part of a single logical structure. Clients have a Single Logical View of the whole graph. The distributed nature of the graph allows parallel ingest and query operations to run with minimal crossplay, enabling ThingSpan to perform predictably as the numbers of nodes, connections and users grow.

 

Summary

I hope that I’ve shown that gluing an extra layer on top of the wrong technology doesn’t produce good results. Strapping a jet engine on a Ford 150 truck won’t make it competitive with an F-35 fighter or Concorde. Just as the hierarchical and network databases struggled to cope with SQL and were eclipsed by true RDBMSs, those earlier technologies and NoSQL variants can’t be used to build an efficient graph database.

 

With ThingSpan we’ve leveraged a proven, scalable, highly performant, distributed object database to make graph navigation and pathfinding queries perform at scale. Object databases weren’t designed to compete with RDBMSs, but to handle highly interconnected data and navigational queries efficiently, so they are a natural match for graph analytics.

 

Whereas a relational database will take 2 * N index lookups and 2 * N logical reads to find the N objects connected to starting point, ThingSpan takes 0 to N (typically about N/50) key lookups and N physical reads. You can prove it yourself by downloading ThingSpan from our website or the Amazon AWS Marketplace.

 

 

[1] For navigational operations only.

SHARE THIS POST
Share on FacebookTweet about this on TwitterShare on Google+Share on LinkedIn