Data Scientists' Corner

ThingSpan Technical Overview Presentation

This short technical video presentation highlights an important deficiency with traditional analytical tools and shows how ThingSpan, a distributed graph analytics platform, helps solve them at enterprise speed and scale.

ThingSpan Financial Use Case Video

This brief video presentation highlights the power of ThingSpan in a financial services use case. ThingSpan is able to ingest one billion financial transaction events in hours while performing complex query and graph operations simultaneously. ThingSpan adds analytical value beyond statistics to show the actual behavior of the transactions when doing sub-graph similarity.


What’s Missing in Big and Fast Data?

Why enterprises today have requirements beyond Big Data, Fast Data, and traditional analytics

Big Data

Big Data, also known as data at rest, is high-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation. Big Data is synonymous with Hadoop®, but also includes data in databases, data warehouses and other data collected from enterprise applications.

Big Data Analytics is the process of examining large data sets to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information. Processing Big Data is typically performed post collection in batch mode across a cluster of servers.

 

Fast Data

Fast Data, also known as data in motion, is real-time streaming data such as events and data from connected devices such as computer and sensor networks (e.g. Smart Grid meters), applications and social media sites.

Streaming data is an analytic computing platform that is focused on speed. This is because these applications require a continuous stream of often unstructured data to be processed. Therefore, data is continuously analyzed and transformed in memory before it is stored on a disk. Processing streams of data works by processing “time windows” of data in memory across a cluster of servers.

 

Big Data Analytics

Fast Data Analytics
  •          Multiple data silos
  •          Streaming data
  •          Structured data in databases
  •          In memory processing
  •          Unstructured data in Hadoop
  •          Data stored after processing
  •          Processed in batch mode
  •          Real-time processing of streams
  •          Business Intelligence tools
  •          Machine Learning algorithms

 

Advanced Analytics

Advanced Analytics platforms make up the tools used by statisticians and data scientists who create models to answer questions about the unknowns within the data. Business intelligence is for business analysts and line of business users looking for direct answers to business questions.

 

What’s missing?

Both Big Data Analytics and Fast Data Analytics focus on the data only. However, there are relationships, either explicit or hidden, in the data waiting to be discovered and exploited. Traditional analytics fail to use these relationships or cannot perform to meet the requirements of scale for Big Data or speed for Fast Data. To meet these new analytics requirements, a different way of representing the layout of data and relationships is needed. The best way to represent data and relationships is as a graph.  Most current solutions load Big and Fast Data from their respective data stores into memory to build the graph and process the graph in memory. Memory is getting cheaper and larger, but memory is still finite. To update the graph, the data has to be reloaded into memory.

 

What’s needed?

A platform that supports the combining of multiple data sources of Big and Fast Data, allows Big and Fast Data Analytics and enables the use and exploitation of relationships in the data i.e. supports graph analytics at speed and scale is required. The graph is a living graph, always being updated by streams of data, and growing beyond what fits into memory.

 

ThingSpan

Objectivity’s ThingSpan is an enterprise graph platform that not only integrates with open source technologies such as Spark, Kafka, HDFS, YARN, etc., but ThingSpan also manages the complex relationships in order to answer the hard questions in real-time for large scale analytics.

The ThingSpan platform provides the real-time performance required by data scientists and analysts in order to discover the unknowns to make better decisions with this new insight not only today, but with the ability to scale-out with larger and more complex data sets in the future.