Graph Analytics with Billions of Daily Financial Transaction Events
In this third blog of the series we updated the performance results to include the numbers from running on a 64 node Amazon EC2 cluster. Also, we continue to look for different ways to optimize the performance of ingest and query. In this update, we further optimized the way ThingSpan stores relationships for large degree fan out connectivity.
As a reminder, the requirement for this use case was to ingest one billion financial transaction events in under 24 hours (about 12,000 transactions a second) while being able to perform complex query and graph operations simultaneously. ThingSpan has proven to surpass this requirement to ingest while performing these complex queries at the same time at speed and scale. It is important to note that the graph in this case is incrementally added to as new events are ingested. In this way the graph is always up to date with the latest data (nodes) and connections (edges) and is always available for query in a transactional consistent state, what we call a living graph.
We used Amazon EC2 nodes for the tests and found that “m4x2large” gave the best resources for scalability, performance, and throughput.
We initially surpassed the performance requirements using a configuration consisting of 16 “m4x2large” nodes. For this test we used the native Posix file system.
Full details of the proof of concept can be downloaded from:
Below are the updated results when running on a 64 node Amazon EC2 cluster, with the performance optimization described above.
We can use these results to show near linear scale up as the volume of data increases and scale out to achieve better throughput performance.
The blue bar represents the time to ingest 1 billion events prior to the performance optimization. The red bar represents the time to ingest 1 billion events with the performance optimization. Therefore, by doubling the number of compute nodes, the ingest time almost halves.
The results show that ThingSpan continues to scale near linearly as the volume of data increases and as the number of compute nodes is increased.
ThingSpan exceeded the requirement of ingesting one billion transaction events in under 24 hours while simultaneously querying the graph.
We also show that doubling the number of nodes from 16 to 32 and from 32 to 64 almost halved the ingest time.
Stay tuned for future blogs that will highlight ThingSpan’s performance as we scale out the number of compute nodes and the number of transaction events.