So you’ve just finished the conceptual design on the next big Web 3.0 product and you’ve decide to use a graph database to help solve your big challenge: “How do I effectively manage all the known (and often) unknown relationships in my data?”. Your data model maps rather nicely to the graph’s nodes and edges model. People, places, things are vertices while the relationships are the edges between them. So far, so good. But then you also want the ability to take user-defined entities and insert them into the graph as well. After all, you don’t want to be tied down to a fixed, rigid schema model. Flexibility to define or modify your model at runtime is critical to your product’s success and your user base will expect nothing less than a fast, seamless experience.
Schema-less gives greater flexibility, but at a cost.
Your first temptation might be to use a schema-less graph database; one that allows vertices and edges to be created as objects whose attributes are represented as simple property maps or buckets. (See Figure 1 below)
This seems like the perfect solution. Using this structure, virtually any entity can be modeled in the graph and it works equally well for both your known entities and your user’s entities. You quickly code up a prototype, load some data, and run some queries. But something isn’t quite right. The performance seems rather slow, especially considering the size of the data set you’re using. Alas you’ve discovered that, akin to life, nothing is truly free and what you get with flexibility in schema is lost through performance in the accessing the data. This model translates into sub-optimal query processing because the code that needs to make a decision on which paths to explore in the graph has to first ask each object it encounters what type it is; an expensive operation considering it has to potentially look at millions if not billions of elements. In aggregate, this becomes a DB bottleneck and ultimately effects overall product performance.
Full schema models yield greater performance, but aren’t as flexible.
Switching to a full schema model, like the one currently used in InfiniteGraph, you easily gain back that lost performance. The navigation engine in InfiniteGraph allows coders to take advantage of the strong type information on graph elements to efficiently traverse the graph, qualifying paths without examining (opening) the objects themselves.
But again, what you make up for in performance, you end up losing in flexibility; and in this case, at the expense of providing dynamic schema capabilities. InfiniteGraph’s strength in performance relies on having a pre-defined set of class definitions representing the graph elements. While this works great for environments where this information is known prior to product deployment, there are challenges for any application where schema and data models may need to evolve more often over time.
So what’s the answer then? Do we have to live with these tradeoffs or can something be done to bridge this feature gap?
The best answer might involve both schema and schema-less support.
The answer we believe is to provide users with a schema-hybrid model involving strongly-typed objects for performance alongside loosely-typed objects for flexibility; the latter implemented using Document object model representations such as JSON strings. The figures below illustrate both model representations in InfiniteGraph. Figure 2 below shows how user graph elements are modeled using strongly typed objects inheriting structure and behavior from a set of base classes.
Figure 3 below shows how schema-less capabilities can be implemented using Document-type graph elements. Together these provide a good solution for those looking for performance AND schema model flexibility.
String-based document storage and access can be done today with a rather trivial amount of code. We provide this internal code to our customers who need this functionality now. The next release of InfiniteGraph will include integration with the indexing and Visualizer components.
In summary, using a hybrid schema model with InfiniteGraph can provide the following:
- A mixed-model data persistence strategy
- Fixed fields for data constraints and fast query
- Dynamic or document-wrapped fields for flexibility
- The ability to store non-scalar/primitive data types such as maps and arrays on IG elements
- Better data exchange with polyglot environments (e.g. document databases, key/value stores)
If you are working on a project that requires these capabilities right now, please contact us. We can provide you internal field-engineering code compatible with the latest public version of InfiniteGraph (v.2.0). This same code will be packaged in our next release as well.
Mark Maagdenberg is a Senior Field Engineer for Objectivity, Inc. (the company behind InfiniteGraph). Mark has over 20 years experience as a software professional working at several prominent software companies in Silicon Valley including Ashton Tate, Intuit, Vantive, as well as several successful start-ups. His career includes positions as a User Interface Architect, Solutions Consultant and Sales Engineer and has worked on a variety of successful B2C and Enterprise software projects. Currently a member of the Sales Engineering team at InfiniteGraph and Objectivity, he provides product solutions for customers in both traditional on-premise and cloud environments. Mark holds a degree in Computer Engineering from Santa Clara University.