Many articles and blogs, including our own, have shown how graph databases can be used to look at financial transaction data to see if particular individuals or organizations are involved in money laundering or other kinds of fraud. In this blog I will expand on this issue and explain how institutions can use Objectivity’s ThingSpan and GraphX, running on Apache Spark, to tackle detect financial fraud more quickly and efficiently.
In a typical scenario, investigators are trying to determine the money trail initiated by the perpetrator(s). This is a very simple navigational query using a graph database, along the lines of “Starting at the Person_X vertex, perform a transitive closure using Financial_Transaction edges and any kind of vertex.”
That is great if we already know that Person_X is of interest, but what if all we have is a huge graph of recent and historic financial transactions garnered from multiple sources, such as banks, exchanges, real estate transfers, etc.? The problem evolves from being a simple query to being a Big Data analytics one. We are now interested in pattern-finding, not path-following.
This is the third and final installment in a blog series examining exchange markets and how they are a good use case for graph databases.
In college, I was in a student group that wrote a research paper about redesigning election systems, which were traditionally dominated by two parties, to include viable third-party candidates. This was inspired by the two consecutive failed presidential bids by third-party candidate Ross Perot in 1992 and 1996. We found that the simplest way to reasonably include third-party candidates was to allow voters to rank two or more candidates, instead of voting for a single candidate.
The problem though was that if most voters preferred the same second-rank candidate, then that second-rank candidate would likely be elected. You can imagine in this scenario that candidates might purposely pursue becoming the second-rank candidate in order to win. In a market design like this, most candidates and political strategists would attempt to exploit the weaknesses of the system to engineer the outcome.
There are many reasons Spark is fast becoming the defacto standard for large scale data processing. While clever use of in memory computing, optimized execution and built in machine learning libraries are often cited as reasons for its meteoric rise in popularity, it’s the way Spark has embraced structured data and external sources that I find particularly impressive.
No matter what your reason for using Spark, it will almost certainly involve reading data from external sources. A common use case is to consume large quantities of unstructured operational data dumped into HDFS and fuse it with structured historical metadata that represents the system’s learning or knowledge over time. Typically, this knowledge repository is maintained in a database that can also be leveraged and updated by other applications, business systems etc.
Over the past few releases of Spark, SparkSQL and the Dataframes API have evolved as a powerful way to interact with structured data. At the lowest level it allows an external datastore to be represented as a set of Dataframes which are akin to virtual SQL like tables. This allows the use of SQL to access data from disparate datasources, even joining across tables that derive from totally separate physical datastores.
The Federal Bureau of Investigation recently released its annual report, “Crime in the United States,” which stated that in 2014 there were 1.2 million violent crimes committed in America, 63.6% of which were aggravated assaults. In addition, 8.2 million property crimes were reported by law enforcement agencies; victims suffered financial losses of approximately $14.3 billion.
However, not all larceny occurs in “the real world,” per say. In a separate FBI report on Internet crime, cybercrimes accounted for nearly 270,000 documented incidents last year. These illicit activities, which include auto and real estate fraud, government impersonation scams and extortion, resulted in total losses of $800 million.
With the sheer volume of crimes being committed on a daily basis and the severity of the resulting financial damages, clearly more could be done to deter future incidents. Unfortunately, as technology advances, criminals become more sophisticated in their methods, and it becomes even more paramount to remain multiple steps ahead.
Information fusion has its foundation in data fusion as used by military and intelligence agencies, generally defined as the use of techniques that combine data from multiples sources and gather that information in order to achieve inferences. This process would be more efficient that if the fusion was achieved by means of a single source.
Depending on the model used, there are several levels of assessment or refinement. As the fusion process goes through these different levels, the information is refined as more value is added. Information fusion can be defined as the process of merging information from disparate sources despite differences in conceptual, contextual and typographical representations, typically combining data from structured, unstructured and semi-structured resources.
The world is full of real world objects (people, places, things) and relationships (knows, likes). Information fusion works with these real world objects and relationships, and in the fusion process discovers new objects and relationships. The best way to represent these is in an object model representation.
As a writer and marketer, I’m no stranger to using catchy buzzwords to succinctly explain an important concept that many people are facing. Despite the fact that buzzwords are used so ubiquitously that they are often added to the Oxford English Dictionary (including emoji, twerk, and cakepop, to name a few), they are not as beloved in the enterprise technology industry as they are among consumers.
The irony behind our love/hate relationship with buzzwords, such as Big Data and IoT, is that everyone throws these labels around, but no one can agree on what they mean.
To dispel some of this confusion, I’m here to discuss a term that Objectivity has been at the forefront for years: Information Fusion.