Many articles and blogs, including our own, have shown how graph databases can be used to look at financial transaction data to see if particular individuals or organizations are involved in money laundering or other kinds of fraud. In this blog I will expand on this issue and explain how institutions can use Objectivity’s ThingSpan and GraphX, running on Apache Spark, to tackle detect financial fraud more quickly and efficiently.

In a typical scenario, investigators are trying to determine the money trail initiated by the perpetrator(s). This is a very simple navigational query using a graph database, along the lines of “Starting at the Person_X vertex, perform a transitive closure using Financial_Transaction edges and any kind of vertex.”

That is great if we already know that Person_X is of interest, but what if all we have is a huge graph of recent and historic financial transactions garnered from multiple sources, such as banks, exchanges, real estate transfers, etc.? The problem evolves from being a simple query to being a Big Data analytics one. We are now interested in pattern-finding, not path-following.



Fig. 1: ThingSpan architecture with GraphX

Consider Figure 2 below, where we show a small portion of such a graph loaded into Objectivity’s ThingSpan. Person-to-person transfers have been omitted for simplicity. Most of the relationships between accounts are standard interbank transfers, with the exception of Accounts 24 to 35 as the purchase and sale of real estate.



Fig. 2: Graph of financial data in Objectivity’s ThingSpan

Now imagine that there are billions of transactions. This is where we can apply Apache GraphX, running on Spark, to access the ThingSpan metadata store through automatically generated DataFrames and apply standard degree centrality algorithms to every Person object.

The algorithms will run in parallel across as many nodes as we need to tackle the problem. The algorithm can be directed to only return the IDs of Person objects that are connected to more than N bank accounts, with N = 6. The result is shown below in Figure 3. GraphX returned P2 as the sole candidate for the next phase of the analysis. Let’s assume that several of the associated counts are offshore, including Account 35.



Fig. 3: Result of Person associated with more than 6 Accounts

The next step is to apply a qualified ThingSpan query that examines the outward paths from the object(s) returned from the GraphX step and determines whether they terminate in one or a few offshore accounts. The results are shown below in Figure 4.



Fig. 4: Discovery of offshore Account

The transactions that were initiated by Person P2 do indeed end up depositing money in an offshore account, meaning that the situation looks like money laundering and warrants further investigation.

Note how straightforward the above process is. It can be setup as a standard workflow, periodically examining new portions of the graph created as account and transaction relationships stream into ThingSpan. There are only two straightforward steps, both of which can be preprogrammed with ranges of values for the number of accounts per Person for the GraphX step and the number of offshore (or any) destination accounts for the ThingSpan step. The latter could also be set to ignore paths shorter than a minimum number of hops, as they are probably legitimate personal fund transfers.

Of course, it is important to point out that there are many types of graph computing technologies. As an API of graph algorithms, GraphX can be used with a variety of backend stores; however, to fully harness the power of real-time graph analytics, it’s essential that GraphX is integrated with a massively scalable distributed graph database like ThingSpan. This enables systems to handle hundreds of billions, even trillions, of nodes and edges — scalability that is required for financial institutions that are managing Big Data volumes.

Moreover, without a graph data store as a backend, true relationship discovery cannot be achieved, because financial analysts must already know the types of fraudulent patterns that they wish to find. ThingSpan captures and stores the relationships explicitly, retrieving this critical information in real time so that illicit activity can be swiftly suppressed.

To learn more about how Objectivity’s ThingSpan can leverage GraphX for graph analytics and relationship discovery in your organization, please contact us.



Leon Guzenda

CTMO and Founder

Leon Guzenda - Founder