Telecommunications voice and data networks are natural examples of graph structures: equipment of many types, often from hundreds of manufacturers, must work in harmony to reliably and efficiently transport information for millions of users at a time. Objectivity products have been used at the heart of fiber optic switches, cellular wireless and low earth satellite systems, long-term alarm correlation systems and in network planning applications.
Dealing with problems (alarms) or overloads has traditionally involved taking individual pieces of equipment offline and re-routing the traffic via other nodes. In this example, we’ll look at an apparently simple situation and show how the combination of Spark SQL and ThingSpan’s advanced graph navigation can be used to quickly diagnose and solve an equipment overload situation. We start by loading Location, Equipment and Link (plus loading percentages) objects and connections into ThingSpan, producing the following graph in Figure 1.
Retailers have deployed advanced business intelligence tools for decades in order to determine what to sell and to whom, when, where and at what price. Much of the transactional data was too voluminous for smaller retailers to keep for long, putting them at a disadvantage against the industry giants and more agile web-based retailers. The falling prices of commodity storage and processors are making it possible to keep data longer. This data can also be combined with external sources, such as information gathered from social networks, then analyzed by more powerful machine learning technologies and other tools.
In this blog, we will look at how any retailer—traditional or online—might identify slow-moving products and use their own sales transaction data in conjunction with social media information about bloggers who have mentioned or bought a product in order to identify and target potential buyers.
Many articles and blogs, including our own, have shown how graph databases can be used to look at financial transaction data to see if particular individuals or organizations are involved in money laundering or other kinds of fraud. In this blog I will expand on this issue and explain how institutions can use Objectivity’s ThingSpan and GraphX, running on Apache Spark, to tackle detect financial fraud more quickly and efficiently.
In a typical scenario, investigators are trying to determine the money trail initiated by the perpetrator(s). This is a very simple navigational query using a graph database, along the lines of “Starting at the Person_X vertex, perform a transitive closure using Financial_Transaction edges and any kind of vertex.”
That is great if we already know that Person_X is of interest, but what if all we have is a huge graph of recent and historic financial transactions garnered from multiple sources, such as banks, exchanges, real estate transfers, etc.? The problem evolves from being a simple query to being a Big Data analytics one. We are now interested in pattern-finding, not path-following.
In Part 1 of this blog series, I looked at the fundamental principles behind all database technologies and the evolution of DBMSs as system requirements changed. In this concluding article, I’ll address the enormous changes in requirements that Objectivity is seeing and suggest some ways of attacking the problems that they are introducing.
The Rise of Big Data
Dramatically increased and still growing use of the WWW has made it necessary for companies to gather and analyze a much wider variety of data types than ever before. They also need to store and process more data in order to garner business intelligence and improve operations. This introduces an additional data generator, the data center and communications infrastructure, which can produce voluminous logs from multiple sources.
Many of these “Big Data” systems operate on huge volumes of relatively simple data, much of it requiring conversion, filtering or consolidation before it can be used for analytical purposes. In the early days, much of this new data was stored in structured files. Hadoop, with its MapReduce parallel processing component and the scalable, robust Hadoop Distributed File System (HDFS), rapidly gained momentum making it the mostly widely used framework for Big Data systems.
In the year 2000, it seemed that database technology had matured to the point where changes were incremental, at least in the enterprise. Today, there is such a wide choice of database management systems (DBMSs) and other storage technology that it is hard to determine the best fit for a particular problem. Some wonder whether we have a surfeit of databases. In this blog series, I’ll look back at how we arrived where we are today, examine the new demands and challenges that we are facing, and suggest some lines of attack using Objectivity’s suite of database platforms.