This is the third and final installment in a blog series examining exchange markets and how they are a good use case for graph databases.
In college, I was in a student group that wrote a research paper about redesigning election systems, which were traditionally dominated by two parties, to include viable third-party candidates. This was inspired by the two consecutive failed presidential bids by third-party candidate Ross Perot in 1992 and 1996. We found that the simplest way to reasonably include third-party candidates was to allow voters to rank two or more candidates, instead of voting for a single candidate.
The problem though was that if most voters preferred the same second-rank candidate, then that second-rank candidate would likely be elected. You can imagine in this scenario that candidates might purposely pursue becoming the second-rank candidate in order to win. In a market design like this, most candidates and political strategists would attempt to exploit the weaknesses of the system to engineer the outcome.
Gaming the System
Alvin Roth talks about a similar scenario in a story about the enrollment process for Boston Public Schools1. The Boston school district used a ranking algorithm that employed a lottery system to select students for particular schools and gave priority to parents who placed that school as their first choice. Reminiscent of the broken three-party election system, parents found that if they put their most desired school in the first rank, their chance for even getting their second choice would be lost, because the second-tier schools would be gone before reaching the second round of the lottery.
Figure 1: Hopefully…2
This inevitably left the parents with no choice but to game the system to try to get a school they wanted. They would commonly rank their second or third preferred school first, because they believed that they would have a better shot at getting a good school. This left parents feeling like they had to balance what schools they prefer with how other parents would choose in order to achieve their intended outcome, which may not even be for a preferred school. Describing the flaws in this system, Roth writes, “No matter how well a market is otherwise designed, it will have trouble giving people what they want if it doesn’t make it safe for them to try to get what they want” (Roth 122). Roth explains that this is an issue that almost all markets have to deal with, then goes on to describe a way to identify design flaws like this.
Stabilizing the System
Roth proposes that the secret to a successful market is a design that produces outcomes that are stable. In the school enrollment scenario, a stable outcome is one where there are no parent/school pairs who aren’t matched but should be. If any of these unresolved matches do exist, they are known as “blocking pairs” (Roth 139). The instability of a system is inherent when blocking pairs exist because they force people to work outside the system in order to get their preferred match.
On the other hand, in a perfectly stable market, there are no blocking pairs. Instead, there are only happily matched pairs and people who trust that the exchange will give them what they want.
A large exchange system inevitably has some instability. For the Boston school enrollment program, “the system’s biggest problem was that it didn’t even realize it had a problem” (Roth 122). Finding the instability in a system is challenging, but a well-designed exchange market can be automated to find blocking pairs.
Imagine using a graph database to engineer the monitoring process (described in the previous blog) to look for blocking pairs as well as evaluating successful match chains. This process could perform some cleanup when it finds an unstable match, which would lead to a healthier exchange system overall. This process would reveal logic failures in the match algorithm, which could be used to revise the algorithm to avoid future blocking pairs.
So far, this blog series has covered many examples of large exchange markets, including college admissions, electoral systems and school lotteries, to name just a few. Through this particular blog, we have seen the need for automation to both execute and refine a matching algorithm, and also to monitor and give feedback for improving the health and success of the system. Without a graph database as the primary storage engine, designing this type of exchange at scale would be impossible because it would not allow the system to navigate through matches in real time.
Expanding the System
Objectivity has built a platform called ThingSpan that can write streaming connected data to the underlying graph storage engine in real time using our Apache Spark™ adapter. The Spark adapter also allows data to be read from the graph database so that data can be combined with streaming data to give it context.
For example, imagine that you are processing streaming voting data that needs to be coalesced with fixed data about the candidates. You can also represent voter data in the database uniquely using the update-by-value option (see the code that follows). Moreover, by using ThingSpan, you can store relationships between voters, votes, and candidates entirely through Spark using their unique Object Identifier, or OID.
// Process streaming voting data. val voterStreamDF = ... val voteStreamDF = ... // Write voter stream out to Objectivity. voterStreamDF.write. mode(SaveMode.Overwrite). format("com.objy.spark.sql"). option("objy.bootFilePath", bootFile). option("objy.dataClassName", "gov.election.ca.Voter"). option("objy.updateByValue", "SSN"). save() voteStreamDF.write. mode(SaveMode.Append). format("com.objy.spark.sql"). option("objy.bootFilePath", bootFile). option("objy.dataClassName", "gov.election.ca.Vote"). save() // Reading in a voter data frame with OIDs. val voterDF = sqlContext.read. format("com.objy.spark.sql"). option("objy.bootFilePath", bootFile). option("objy.dataClassName","gov.election.ca.Voter"). option("objy.addOidColumn", "voterOid"). load() voterDF.registerTempTable("voterTable") val voteDF = sqlContext.read. format("com.objy.spark.sql"). option("objy.bootFilePath", bootFile). option("objy.dataClassName","gov.election.ca.Vote"). option("objy.addOidColumn", voteOid"). load() voteDF.registerTempTable("voteTable") // Reading in a fixed candidate data frame with OIDs. val candidateDF = sqlContext.read. format("com.objy.spark.sql"). option("objy.bootFilePath", bootFile). option("objy.dataClassName", "gov.election.ca.Candidate"). option("objy.addOidColumn", "candidateOid"). load() partyDF.registerTempTable("candidateTable") // Creating relationships between vote, voter and candidates via a JOIN. val query = new StringBuilder() query.append("SELECT voteOid, voteTable.candidateName, candidateOid as candidate, candidateName, voterOid as voter, voterName "); query.append("FROM voteTable INNER JOIN candidateTable "); query.append("ON candidateTable.name=voteTable.candidateName "); query.append("INNER JOIN voterTable "); query.append("ON voterTable.SSN=voteTable.voterSSN "); val voteDataDF = sqlContext.sql(query.toString()) // Writing the vote relationships back to the ThingSpan metadata store. voteDataDF.write. mode(SaveMode.Overwrite). format("com.objy.spark.sql"). option("objy.bootFilePath", bootFile). option("objy.dataClassName", "gov.election.ca.Vote"). option("objy.updateByOid", "voteOid"). save()
Once the streaming graph data has been stored, then graph analytics can be performed on the underlying data. This could include static analysis (determining whether the voter voted faithfully along party lines) or dynamic analysis (identifying voter fraud via inconsistent voting circumstances involving time or location).
This approach adds value to the market through real-time visibility into complex questions, and maintains the safety and stability of the system through anomaly detection. Rules can also be defined to perform analyses or investigations at regular intervals using a monitoring agent.
The ability to perform complex ingest of streaming graph data through Spark and combine this with data at rest in real time is what differentiates Objectivity ThingSpan from its competitors. The ability to perform complex relationship analytics on top of that data in real time is what really allows our customers to derive significant business value from our product.
More importantly, we have been proven to handle the data workload for reads and writes at a massive scale. If you are interested in trying ThingSpan for yourself, contact us and let us know how we can help you enhance your business application.
- “Who Gets What and Why: The New Economics of Matchmaking and Market Design” by Alvin E. Roth, Houghton Mifflin Harcourt, 2015
Principal Architect of InfiniteGraph