


ThingSpan Performance Blog Series – Part III
Graph Analytics with Billions of Daily Financial Transaction Events Introduction In this third blog of the series we updated the performance results to include the numbers from running on a 64 node Amazon EC2 cluster. Also, we continue to look for different ways to optimize the performance of ingest and query. In this update, we further optimized the way ThingSpan stores relationships for large degree fan out connectivity. As a reminder, the requirement for this use case was to ingest one billion financial transaction events in under 24 hours (about 12,000 transactions a second) while being able to perform complex query and graph operations simultaneously. ThingSpan has proven to surpass this requirement to ingest while performing these complex queries at the same time at speed and scale. It is important to note that the graph in this case is incrementally added to as new events are ingested. In this way the graph is always up to date with the latest data (nodes) and connections (edges) and is always available for query in a transactional consistent state, what we call a living graph. We used Amazon EC2 nodes for the tests and found that “m4x2large” gave the best resources for scalability, performance, and throughput. We initially surpassed the performance requirements using a configuration consisting of 16 “m4x2large” nodes. For this test we used the native Posix file system. Full details of the proof of concept can be downloaded from: ThingSpan in Financial Services White Paper Updated Results Below are the updated results when running on a 64 node Amazon EC2 cluster, with the performance optimization described above. We...
ThingSpan Performance Blog Series – Part II
Graph Analytics with Billions of Daily Financial Transaction Events Introduction In this second blog of the series we updated the performance results to include the numbers from running on a 32 node Amazon EC2 cluster. As a reminder, the requirement for this use case was to ingest one billion financial transaction events in under 24 hours (about 12,000 transactions a second) while being able to perform complex query and graph operations simultaneously. ThingSpan has proven to handle this requirement to ingest while performing these complex queries at the same time at speed and scale. It is important to note that the graph in this case is incrementally added to as new events are ingested. In this way the graph is always up to date with the latest data (nodes) and connections (edges) and is always available for query in a transactional consistent state, what we call a living graph. We used Amazon EC2 nodes for the tests and found that “m4x2large” gave the best resources for scalability, performance, and throughput. We surpassed the performance requirements using a configuration consisting of 16 “m4x2large” nodes. For this test we used the native Posix file system. Full details of the proof of concept can be downloaded from: ThingSpan in Financial Services White Paper Results Below are the updated results when running on a 32 node Amazon EC2 cluster. We can use these results to show near linear scale up as the volume of data increases and scale out to achieve better throughput performance. The blue bar represents the time to ingest 250 million events. Therefore, by doubling the number of...
ThingSpan Performance Blog Series – Part I
Graph Analytics with Billions of Daily Financial Transaction Events Introduction In this first blog of the series we will explain how ThingSpan can deliver solutions to meet the demanding needs of today’s complex analytics systems by combining big and fast data sources. In particular, we will focus on ThingSpan’s capabilities to scale up as data volumes increase, and scale out to meet response time requirements. The requirement for this use case was to ingest one billion financial transaction events in under 24 hours (about 12,000 transactions a second) while being able to perform complex query and graph operations simultaneously. ThingSpan has proven to handle this requirement to ingest while performing these complex queries at the same time at speed and scale. It is important to note that the graph in this case is incrementally added to as new events are ingested. In this way the graph is always up to date with the latest data (nodes) and connections (edges) and is always available for query in a transactional consistent state, what we call a living graph. We used Amazon EC2 nodes for the tests and found that “m4x2large” gave the best resources for scalability, performance, and throughput. We surpassed the performance requirements using a configuration consisting of 16 “m4x2large” nodes. For this test we used the native Posix file system. Full details of the proof of concept can be downloaded from: ThingSpan in Financial Services White Paper Test Description Data In our example use case, approximately one billion financial transaction events occur each day. Each event produces a subgraph that creates or merges 8-9 vertices and creates...
In the beginning
In the beginning there was data. Then Codd (and Date) created relational database systems, and then there was structured query language (SQL). SQL was good for queries by values of data, and queries where you knew what you were looking for. You could answer the known questions. Data was neatly organized into rows (records) and columns (fields) of tables. You could even query across tables using “joins” if you knew what to join.