Introduction

Successful Big and Fast Data projects require a platform that can:

  •             Scale massively
  •             Perform in near real-time

The Objectivity ThingSpan platform is unique in its ability to meet those needs. No other platform comes close. As an example, this blog highlights one use case that far exceeds anything else available.

 

This example also illustrates ThingSpan’s ability to grow linearly as computing resources are added. Further, it shows the value of ThingSpan’s user language, DO, to aid the user in analyzing and visualizing the desired results. Finally, this example is available in the Amazon Cloud, so that you can confirm ThingSpan’s performance and scale for yourself.

 

Does today’s massive quantities of data from multiple sources fit neatly into rows and tables for enterprises and governments to analyze and get the answers needed to make critical operational decisions in near real-time?  Have open source technologies proven that they can handle the performance and scale demanded of mission critical operational systems? Can critical operations afford to wait for hours and days for after the fact analytics?

 

Objectivity’s ThingSpan is the only data management platform that can ingest billions of complex, interrelated records while simultaneously building the graph in order to gain new insights in near real-time for operational systems.  The use case further described below is a very simple one for ThingSpan yet the customer’s existing relational based system was not even able to provide an answer back.  ThingSpan not only ingested the data 5 times faster while also connecting the data, but was able to perform complex queries that their existing relational system could not do.  It is Objectivity’s unique and proprietary distributed architecture that is able to handle and surpass the massive scale and performance requirements of data and sensor fusion production systems.

 

If your Big and Fast Data or IoT project is mission critical, you cannot afford to ignore ThingSpan. It will prove to be your only path to success.

 

The Use Case

With Objectivity’s products, enterprises can discover unknown connections in real-time across Big and Fast Data clusters with large distributed graphs consisting of trillions of nodes and edges. One of our customers had a data set of over ten billion records which continues to grow by tens of millions new records every day. Their traditional relational database was able to ingest the data in under 10 hours, but the indexing required for the queries required additional processing time. Objectivity’s ThingSpan was able to exceed the requirements by ingesting the data while simultaneously building the graph in less than 2 hours (no indexing required). ThingSpan also showed linear scalability as more data was added and linear scale out as more nodes were added to the cluster. Their existing relational database was not even able to produce results when querying their highly connected data out three or more degrees of separation. However, ThingSpan performed complex queries on the graph built from the ten billion call detail records with sub-second response times.

 

Environment

With this customer we used Amazon EC2 because it was easy to configure for scale out testing, and we only paid for the resources as we needed them. The test data was generated and stored in Amazon S3. Based on previous experience we consistently used m4x2large nodes.

 

Schema

We used ThingSpan’s DO (Declarative Objectivity) declarative language to define the schema. See below.

 

Ingest Results

In this use case loading and querying simulated telephone call detail records (CDR), ThingSpan showed linear scale up as the volume of data increased and linear scale out as the ingest workload was spread across multiple nodes in the Amazon EC2 cloud environment.

 

The ingest, including building the graph (connecting the data), of 10 billion CDRs on 4 EC2 nodes (specifically m4x2large) took 717 minutes; on 8 EC2 nodes (specifically m4x2large) took 376 minutes; and on 8 EC2 nodes (specifically m4x2large) took 244 minutes.

 

By loading the first billion, followed by additional 2 billion, additional 2 billion, and additional 5 billion, ThingSpan showed near linear scalability as more CDRs were added.

 

The optimum ingest throughput was on the 6 node configuration, ingesting at a rate of 3,600,000 per minute per CPU for a total ingest time of 463 minutes. The first 1 billion CDRs took 44 minutes, the next 2 billion took 91 minutes, the next 2 billion took 94 minutes, and the last 5 billion took 234 minutes. At a certain point, adding more compute nodes does not really help the ingest performance.

Query Results

Some example queries are shown below:

Query by value:

Show all phones that contain ‘151421565’ in the phone number.

SELECT * from Phone where contains(phoneNumber, ‘151421565’);

Query by navigation:

Find any phones that have up to 5 levels of connections with a specific phone (query by value to find start node, then navigate out up to 10 degrees)

MATCH path = (p1:Phone{phoneNumber = ‘13776380676’})-[*1..10]->(p2:Phone) RETURN path;

 

The results of the actual queries are shown below for 1 billion and 10 billion call detail records. Queries 7 and 8 can be improved by adding indexes, but these were not necessary for the graph navigation queries.

Conclusion

We were able to show that using ThingSpan we could load the 10 billion call detail records in under 2 hours, well within the time required by the relational database.

ThingSpan demonstrated scale as the volume of data increases and scale out by adding more compute nodes to the cluster.

We also showed that ThingSpan could do the navigation queries up to 10 degrees, way beyond the limitations of the relational database solution, while still being able to perform the data queries.

This is another example of leveraging the speed and scale of ThingSpan.

Internally at Objectivity we have often classified our customers and prospects as either brilliant or desperate.  The brilliant ones know up front that they need a platform that can not only handle today’s complex, interrelated data, but also the expanding requirements of tomorrow. They know that relational or open source technologies cannot perform and scale as their system requires and see the reason Objectivity can is due to the ability to handle the complexities natively.  The desperate usually come to us after they have played with open source and proven to themselves that they cannot get the performance and scale they need.  Which are you? Data brilliant or desperate?

 

SHARE THIS POST
Share on FacebookTweet about this on TwitterShare on Google+Share on LinkedIn