Cassandra excels at storing large, active, decentralized datasets. Additionally, Cassandra’s rich data model allows efficient use for many applications beyond simple associative arrays. One interesting application is the processing of large-scale graph structures.
I have devised a graph application layer to extract and process social network analysis data from Cassandra, using InfiniteGraph (which you can download and use for free). I have written more about the technical benefits of the social-graph-extract application layer and its use of graph-oriented processing on blog.stavi.sh.
Social network analysis is one application of a more general category, relationship analytics, as defined by Curt Monash. The relationship analytics problem domain maps well to the unique features of the Cassandra-InfiniteGraph hybrid system:
- dedicated vertex/edge API
- data can be clustered according to vertex/edge proximity
- disk-based/memory-centric access
- peer-to-peer communication from InfiniteGraph node to Cassandra node
- bidirectional updates between raw Cassandra data and Infinitegraph analytics
- parallel streaming and caching from InfintiteGraph
- modeling flexibility to support a variety of sources
- redundancy and high-availability
- precision and speed for graph analytics
- finding extremely long paths, all paths, unknown paths, or paths of nontrivial or indeterminate length
Current business problems that can utilize these features:
- analyzing high-frequency trading
- discovering high degrees of mutual interconnection in social networks
- data mining subtle retail correlations
- product recommendation engines
- determining terrorist or criminal behavior inferred from known relationships
- finding a pattern of relationships for fraud detection
- investigating the directed relationships between proteins and genes
- checking which entity has the shortest average connection to a group of others for cyber security (botnet controller)
The working codebase for this Cassandra / InfiniteGraph integration can be retrieved from GitHub. This project was originally coded using an early beta of InfiniteGraph, but I haven’t seen any issues with the latest version of InfiniteGraph. Forking of the main project is welcome (including downstream updates). If you have any questions or suggestions, please contact @toddstavish.
Todd Stavish is a Senior Systems Engineer for Objectivity, Inc. (the company behind InfiniteGraph), and is focused on our federal and government business. Todd Stavish has expertise in a range of distributed computing applications. He has worked in telecommunications, process control, auitomation and scientific computing. Todd specializes in advising customers about complex modeling, performance optimization and building fault tolerance applications.