InfiniteGraph Video: Making the Connection.

Here’s another video story from Objectivity, Inc. (the company behind InfiniteGraph) which shows another example of how people are helped by technology (or not, as parts of this video demonstrate!). InfiniteGraph powers systems that help people and companies make connections – whether it’s around online dating, advertising, social networking, fraud detection, business or other intelligence. We help connect the dots in big data.

Follow us on Twitter @infinitegraph, and download InfiniteGraph’s free community version to kick-start your next project. And maybe you can build something that helps others find “The One.”

[youtube=http://www.youtube.com/watch?v=YWP74mY-8ls]

A reminder to our visitors: The InfiniteGraph Developer Contest is still underway. Download the full version of InfiniteGraph after registering. Build something awesome. You could win up to $12,000 in Apple computer products, gear and tech!

Error starting AMS (oostartams) on Ubuntu

We have a few users reporting an error when running oostartams (the AMS server) on Ubuntu platforms. If you see “Exit code=13” or you run oocheckls from the command line and get the response ‘The Lock Server is not running‘, this could be a known issue with IPv6 which can be immediately solved by disabling it:

First, to see if IPv6 is enabled run:

cat /proc/sys/net/ipv6/conf/all/disable_ipv6
If it returns “0” then it is enabled. To disable, edit “/etc/sysctl.conf” and add:
#disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
And then reboot your machine.
# # #
Note: This issue will be resolved in the next build or public release.
InfiniteGraph and RDF tuples

InfiniteGraph and RDF tuples

Learning the various NOSQL data technologies can be a bit confusing, particularly given some overlapping capabilities and claims out there.

RDF triples are a data format or data structure that can be used to represent entities and relationships, and are generally expressed using a subject, predicate and object (“Todd calls Jay“). A collection of triples is a labeled, directed multigraph. For us, all of the people Todd calls would basically be a subgraph of our “Link Hunter” example, which you can download here).

Triple stores are good at allowing you to query a subgraph worth of data with some SQL like qualifiers (all the people Todd called from Cincinnati on Tuesday). Triple stores are good at isolating a subgraph worth of data based on arbitrary, ad-hoc criteria. Triple stores work like how you use a search engine: Give it some search terms, and you get back a set of results.

Todd calls Jay

This is an RDF tuple, and a graph. You don’t necessarily need a graph database to find this connection. On the other hand, most connections in most data aren’t this simple.

Given this overlap of related functionality, one of the most common questions we hear is whether InfiniteGraph supports RDF (Resource Descriptive Framework) tuples (triples), whether it works like a triplestore, and/or if we can easily work alongside a triple store.

The short answer to all these questions is: Yes.

One of our latest large customers in the government space is using RDF as the means of integration, just the same way someone would use XML or CSV. They have a bunch of underlying medical records datastores that they extract to RDF, build a graph, and then perform queries against it. These queries are navigational across symptom paths and are used to predict disease, suggest future treatment, or determine the level of benefits for disability. Tthe level of disability is found by traversing all disease edges and adding the weights. Some diseases make you more disabled then others. It’s very quantitative. This project involves a lot of data, is mission-critical, and serves the U.S. government.

InfiniteGraph can import RDF as easily as a triple store. You simply write parsing code that is basically a loop that reads RDF and creates a graph. What InfiniteGraph is best at though is navigational, multi-hop, multi-path analysis (also using arbitrary criteria on the vertex edge properties as well as filtering by type, degree of separation, etc). For example, “Show me all of the people Todd called, AND all of the people that they called,” or “Show me all of the ways that Todd might have sent money to Jay.”

Yes, InfiniteGraph can be used to analyze triples and RDF. But if that’s all you want to do, then you really should just use a triple store.

Our graph database trades some of the runtime flexibility (but not a lot) for well defined types and performance. RDF is fine for all the examples that have been circulated, if I just want to list all my friends or all the people I know who are married, its no big deal because the fanout of a single degree is extremely small. In fact, you can probably even just do it in mySQL for that matter. When we talk about scalability however, it’s not really about how much data we can store, but how quickly we can run across it. Storing RDF makes this effort slower. Its hard to make RDF perform, because the whole graph is self describing and therefore is computationally expensive to parse… Think of it like representing data in XML versus a defined binary format. XML is lovely to work with, basically human readable, but it is very verbose and inefficient.

So, InfiniteGraph supports and reads RDF tuples, and also can work alongside your dedicated triple store. In many cases however, your requirements might be such that you actually don’t need an RDF triple store, and could use the graph database directly. Alternately, you might also find you can use one of the RDF products out there that includes some simple graph methods, and you won’t even need InfiniteGraph. The key is whether you need to analyze triples more or less than you need some deeper graph analytics.

Our company has a long history in helping customers determine the most optimal architecture and designs for their systems (and we don’t try selling things people don’t need).

So, take a look at InfiniteGraph, our documentation and developer Wiki, and let us know if you have any questions. We’d love to find out more about your project! You can also contact me @toddstavish.

Todd StavishTodd Stavish is a Senior Systems Engineer for Objectivity, Inc. (the company behind InfiniteGraph), and is focused on our federal and government business. Todd Stavish has expertise in a range of distributed computing applications. He has worked in telecommunications, process control, auitomation and scientific computing. Todd specializes in advising customers about complex modeling, performance optimization and building fault tolerance applications.

Real-time relationship analytics from large-scale graph processing

Real-time relationship analytics from large-scale graph processing

Cassandra excels at storing large, active, decentralized datasets. Additionally, Cassandra’s rich data model allows efficient use for many applications beyond simple associative arrays. One interesting application is the processing of large-scale graph structures.

I have devised a graph application layer to extract and process social network analysis data from Cassandra, using InfiniteGraph (which you can download and use for free). I have written more about the technical benefits of the social-graph-extract application layer and its use of graph-oriented processing on blog.stavi.sh.

Social network analysis is one application of a more general category, relationship analytics, as defined by Curt Monash. The relationship analytics problem domain maps well to the unique features of the Cassandra-InfiniteGraph hybrid system:

  • dedicated vertex/edge API
  • data can be clustered according to vertex/edge proximity
  • disk-based/memory-centric access
  • peer-to-peer communication from InfiniteGraph node to Cassandra node
  • bidirectional updates between raw Cassandra data and Infinitegraph analytics
  • parallel streaming and caching from InfintiteGraph
  • modeling flexibility to support a variety of sources
  • redundancy and high-availability
  • precision and speed for graph analytics
  • finding extremely long paths, all paths, unknown paths, or paths of nontrivial or indeterminate length

Current business problems that can utilize these features:

  • analyzing high-frequency trading
  • discovering high degrees of mutual interconnection in social networks
  • data mining subtle retail correlations
  • product recommendation engines
  • determining terrorist or criminal behavior inferred from known relationships
  • finding a pattern of relationships for fraud detection
  • investigating the directed relationships between proteins and genes
  • checking which entity has the shortest average connection to a group of others for cyber security (botnet controller)

The working codebase for this Cassandra / InfiniteGraph integration can be retrieved from GitHub. This project was originally coded using an early beta of InfiniteGraph, but I haven’t seen any issues with the latest version of InfiniteGraph. Forking of the main project is welcome (including downstream updates). If you have any questions or suggestions, please contact @toddstavish.

Todd Stavish

Todd Stavish is a Senior Systems Engineer for Objectivity, Inc. (the company behind InfiniteGraph), and is focused on our federal and government business. Todd Stavish has expertise in a range of distributed computing applications. He has worked in telecommunications, process control, auitomation and scientific computing. Todd specializes in advising customers about complex modeling, performance optimization and building fault tolerance applications.

Adding document-style schema flexibility to your InfiniteGraph application

Adding document-style schema flexibility to your InfiniteGraph application

So you’ve just finished the conceptual design on the next big Web 3.0 product and you’ve decide to use a graph database to help solve your big challenge: “How do I effectively manage all the known (and often) unknown relationships in my data?”. Your data model maps rather nicely to the graph’s nodes and edges model. People, places, things are vertices while the relationships are the edges between them. So far, so good. But then you also want the ability to take user-defined entities and insert them into the graph as well. After all, you don’t want to be tied down to a fixed, rigid schema model. Flexibility to define or modify your model at runtime is critical to your product’s success and your user base will expect nothing less than a fast, seamless experience.

Schema-less gives greater flexibility, but at a cost.

Your first temptation might be to use a schema-less graph database; one that allows vertices and edges to be created as objects whose attributes are represented as simple property maps or buckets. (See Figure 1 below)

Figure 1: Schema-less representation of graph elements

This seems like the perfect solution. Using this structure, virtually any entity can be modeled in the graph and it works equally well for both your known entities and your user’s entities. You quickly code up a prototype, load some data, and run some queries. But something isn’t quite right. The performance seems rather slow, especially considering the size of the data set you’re using. Alas you’ve discovered that, akin to life, nothing is truly free and what you get with flexibility in schema is lost through performance in the accessing the data. This model translates into sub-optimal query processing because the code that needs to make a decision on which paths to explore in the graph has to first ask each object it encounters what type it is; an expensive operation considering it has to potentially look at millions if not billions of elements. In aggregate, this becomes a DB bottleneck and ultimately effects overall product performance.

Full schema models yield greater performance, but aren’t as flexible.

Switching to a full schema model, like the one currently used in InfiniteGraph, you easily gain back that lost performance. The navigation engine in InfiniteGraph allows coders to take advantage of the strong type information on graph elements to efficiently traverse the graph, qualifying paths without examining (opening) the objects themselves.

But again, what you make up for in performance, you end up losing in flexibility; and in this case, at the expense of providing dynamic schema capabilities. InfiniteGraph’s strength in performance relies on having a pre-defined set of class definitions representing the graph elements. While this works great for environments where this information is known prior to product deployment, there are challenges for any application where schema and data models may need to evolve more often over time.

So what’s the answer then? Do we have to live with these tradeoffs or can something be done to bridge this feature gap?

The best answer might involve both schema and schema-less support.

The answer we believe is to provide users with a schema-hybrid model involving strongly-typed objects for performance alongside loosely-typed objects for flexibility; the latter implemented using Document object model representations such as JSON strings. The figures below illustrate both model representations in InfiniteGraph. Figure 2 below shows how user graph elements are modeled using strongly typed objects inheriting structure and behavior from a set of base classes.

Figure 2: InfiniteGraph Graph Elements Class Diagram

Figure 3 below shows how schema-less capabilities can be implemented using Document-type graph elements. Together these provide a good solution for those looking for performance AND schema model flexibility.

Figure 3: InfiniteGraph Document-type Class Diagram

String-based document storage and access can be done today with a rather trivial amount of code. We provide this internal code to our customers who need this functionality now. The next release of InfiniteGraph will include integration with the indexing and Visualizer components.

In summary, using a hybrid schema model with InfiniteGraph can provide the following:

  • A mixed-model data persistence strategy
  • Fixed fields for data constraints and fast query
  • Dynamic or document-wrapped fields for flexibility
  • The ability to store non-scalar/primitive data types such as maps and arrays on IG elements
  • Better data exchange with polyglot environments (e.g. document databases, key/value stores)

If you are working on a project that requires these capabilities right now, please contact us. We can provide you internal field-engineering code compatible with the latest public version of InfiniteGraph (v.2.0). This same code will be packaged in our next release as well.

Mark Maagdenberg is a Senior Field Engineer for Objectivity, Inc. (the company behind InfiniteGraph). Mark has over 20 years experience as a software professional working at several prominent software companies in Silicon Valley including Ashton Tate, Intuit, Vantive, as well as several successful start-ups. His career includes positions as a User Interface Architect, Solutions Consultant and Sales Engineer and has worked on a variety of successful B2C and Enterprise software projects. Currently a member of the Sales Engineering team at InfiniteGraph and Objectivity, he provides product solutions for customers in both traditional on-premise and cloud environments. Mark holds a degree in Computer Engineering from Santa Clara University.