In the beginning there was data. Then Codd (and Date) created relational database systems, and then there was structured query language (SQL). SQL was good for queries by values of data, and queries where you knew what you were looking for. You could answer the known questions. Data was neatly organized into rows (records) and columns (fields) of tables. You could even query across tables using “joins” if you knew what to join.
The use cases for this data became more widespread, more common. Data came from many different sources. But, the work flow remained the same. Data was captured, stored, analyzed and presented to the end user in some visually pleasing way.
Data became information (or knowledge). But, data from different systems was still stored in different silos.
Information technology became smarter, and information became intelligence by fusing data from different sources. Business intelligence and visualization tools abounded.
Then data grew into “Big Data”. It became more voluminous, less structured, more difficult to organize, manage, and query. New tools were built to manage and query Big Data, the so-called “NOSQL” (Not Only SQL) tools. Open source projects like Apache Hadoop came into being. Big Data could be store efficiently and processed on clusters of commodity compute hardware, albeit in batch mode. But, the problem remained that you could still only answer the questions where you knew what you’re looking for.
A new breed of analytic tools was needed. The open source Apache Spark project started to fill this need with its machine learning (artificial intelligence) library of algorithms and distributed processing to answer questions in a much more timely way. We’re now starting to answer the questions about the unknowns, but still relying on the analytics of data.
As Big Data grew in volume, the world became smarter, generating a lot more data at ever increasing rates from a multitude of sensors. Everything was being monitored and measured. “Fast Data” was born, so more new tools were needed to handle the streaming of this Fast Data. Tools like Spark Streaming and Kafka were invented.
Now we’ve known from the beginning that the world is not really made of rows and columns of tables. It’s made of “things” such as people, places, events, and relationships between them. Think social networks e.g. Facebook, LinkedIn. Think Internet of Things (IoT). When we fused data from multiple data sources we were actually building a graph of interconnected things. Now we can go beyond asking questions by querying data, we can query the relationships, and we can start to discover the unknowns, not only in the data, but also in the relationships themselves. Given two arbitrary things how are they possibly connected?
The only way to perform this new analytics on all the data, both big and fast, is with a platform that makes available all these different tools to manage all aspects of the workflow and provide the new analytics to answer the questions about the unknowns in actionable time. For many applications, it is not enough to just process and manage Big and Fast Data in batch. Applications also require a technology that can handle the complexity and data streaming in from multiple sources in order to gain value in seconds, not hours or days.
Objectivity’s ThingSpan is an enterprise graph platform that not only integrates with open source technologies such as Spark, Kafka, HDFS, YARN, etc., but ThingSpan also manages the complex relationships in order to answer the hard questions in real-time for large scale analytics.
The ThingSpan platform provides the real-time performance required by data scientists and analysts in order to discover the unknowns to make better decisions with this new insight not only today, but with the ability to scale-out with larger and more complex data sets in the future.
Corporate VP of Product