Learning the various NOSQL data technologies can be a bit confusing, particularly given some overlapping capabilities and claims out there.
RDF triples are a data format or data structure that can be used to represent entities and relationships, and are generally expressed using a subject, predicate and object (“Todd calls Jay“). A collection of triples is a labeled, directed multigraph. For us, all of the people Todd calls would basically be a subgraph of our “Link Hunter” example, which you can download here).
Triple stores are good at allowing you to query a subgraph worth of data with some SQL like qualifiers (all the people Todd called from Cincinnati on Tuesday). Triple stores are good at isolating a subgraph worth of data based on arbitrary, ad-hoc criteria. Triple stores work like how you use a search engine: Give it some search terms, and you get back a set of results.
This is an RDF tuple, and a graph. You don’t necessarily need a graph database to find this connection. On the other hand, most connections in most data aren’t this simple.
Given this overlap of related functionality, one of the most common questions we hear is whether InfiniteGraph supports RDF (Resource Descriptive Framework) tuples (triples), whether it works like a triplestore, and/or if we can easily work alongside a triple store.
The short answer to all these questions is: Yes.
One of our latest large customers in the government space is using RDF as the means of integration, just the same way someone would use XML or CSV. They have a bunch of underlying medical records datastores that they extract to RDF, build a graph, and then perform queries against it. These queries are navigational across symptom paths and are used to predict disease, suggest future treatment, or determine the level of benefits for disability. Tthe level of disability is found by traversing all disease edges and adding the weights. Some diseases make you more disabled then others. It’s very quantitative. This project involves a lot of data, is mission-critical, and serves the U.S. government.
InfiniteGraph can import RDF as easily as a triple store. You simply write parsing code that is basically a loop that reads RDF and creates a graph. What InfiniteGraph is best at though is navigational, multi-hop, multi-path analysis (also using arbitrary criteria on the vertex edge properties as well as filtering by type, degree of separation, etc). For example, “Show me all of the people Todd called, AND all of the people that they called,” or “Show me all of the ways that Todd might have sent money to Jay.”
Yes, InfiniteGraph can be used to analyze triples and RDF. But if that’s all you want to do, then you really should just use a triple store.
Our graph database trades some of the runtime flexibility (but not a lot) for well defined types and performance. RDF is fine for all the examples that have been circulated, if I just want to list all my friends or all the people I know who are married, its no big deal because the fanout of a single degree is extremely small. In fact, you can probably even just do it in mySQL for that matter. When we talk about scalability however, it’s not really about how much data we can store, but how quickly we can run across it. Storing RDF makes this effort slower. Its hard to make RDF perform, because the whole graph is self describing and therefore is computationally expensive to parse… Think of it like representing data in XML versus a defined binary format. XML is lovely to work with, basically human readable, but it is very verbose and inefficient.
So, InfiniteGraph supports and reads RDF tuples, and also can work alongside your dedicated triple store. In many cases however, your requirements might be such that you actually don’t need an RDF triple store, and could use the graph database directly. Alternately, you might also find you can use one of the RDF products out there that includes some simple graph methods, and you won’t even need InfiniteGraph. The key is whether you need to analyze triples more or less than you need some deeper graph analytics.
Our company has a long history in helping customers determine the most optimal architecture and designs for their systems (and we don’t try selling things people don’t need).
Todd Stavish is a Senior Systems Engineer for Objectivity, Inc. (the company behind InfiniteGraph), and is focused on our federal and government business. Todd Stavish has expertise in a range of distributed computing applications. He has worked in telecommunications, process control, auitomation and scientific computing. Todd specializes in advising customers about complex modeling, performance optimization and building fault tolerance applications.