Introduction

Have you seen a list of the top 100 movies or the top 25 best actors or actresses? Do you ever wonder how those are selected? I have long felt that these lists are not very democratic and can quickly go out of relevancy. In contrast, I can find out ratings on Rotten Tomatoes on movies before they even come out in the theater and the ratings are, in my experience, pretty spot on. Also, more and more, people are taking to social media like Twitter to see what their friends might say about a new movie in order to judge. How can your friend’s be wrong? After all, they know that they can be blamed if you don’t like it.

I have been using the IMDB data set a lot lately to view activity around various Hollywood heavys. I have discovered that the IMDB data set while being massive in size and connectedness, is actually made up of rather lightweight objects. The data set gives a stripped down version available on the IMDB website and limits the kind of rich queries and navigations that one might want to perform. Having lightweight objects in the database can be good because it may allow you to do lookups and simple navigations very quickly and easily, but without the data in the database, it can restrict that types of deep analysis that may want to perform.

Alternatively, there are a number of open and free REST API’s that are available for sites like Twitter and Rotten Tomatoes. Interfacing to the data contained in these sites allows us to fill out the sparse data in the IMDB data set with rich color and depth. This can help us to perform more interesting analysis on data that we were hoping to but up to now were not able to with the limited nature of the dataset.

Advantages to use InfiniteGraph (IG) for Complex Navigations

Using the IG Navigation API, we can easily integrate the qualification of paths with the information that we pull down using the REST API’s. This will allow us to avoid hitting the same fluff over and over again that doesn’t match our criteria. Unlike many navigational API’s, IG’s navigation API has the ability to do complex qualification of objects using Qualifier implementations. The Qualifier interface is simple, but is designed to allow the user to apply a complex algorithm for qualifying the path or the result of a navigation. This type of qualification enables you to use the data points like proxy objects which represent a lightweight object that points to, or references, an object in another data store which might have auxiliary information. Note: if you intend to reuse this navigation configuration, you can jar up the Qualifier implementations and use the plugin annotations to turn it into a navigator plugin. Here is what my Twitter Qualifier looks like. As you can see, this implementation is qualifying actors/actresses based on the number of recent retweets about them.

package com.infinitegraph.enhanced.navigation;

import java.util.ArrayList;
import java.util.Collections;
import java.util.Date;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

import com.infinitegraph.Vertex;
import com.infinitegraph.demo.imdb.types.vertices.Person;
import com.infinitegraph.enhanced.types.Tweet;
import com.infinitegraph.enhanced.types.TwitterEntry;
import com.infinitegraph.enhanced.types.Tweet.TweetMetadata;
import com.infinitegraph.navigation.Path;
import com.infinitegraph.navigation.Qualifier;

public class TweetCountQualifier implements Qualifier
{
	
	private int influencingTweetCount;
	private Map<String, List<Tweet>> influentialTweets;
	private Date minDate;

	public TweetCountQualifier(int influencingTweetCount, long millisAgo)
	{
		this.influencingTweetCount = influencingTweetCount;
		this.minDate = new Date(System.currentTimeMillis() - millisAgo);
		influentialTweets = Collections.synchronizedMap(new HashMap<String, List<Tweet>>());
	}
	
	@Override
	public boolean qualify(Path currentPath)
	{
		boolean qualify = false;
		Vertex vertex = currentPath.getFinalHop().getVertex();
		TwitterEntry entry = null;
		if (vertex instanceof Person)
		{
			Person person = (Person) vertex;
			String name = person.getName();
			String simpleName = name;
			if (name.contains("(")) // Janet Jackson (I)
				simpleName = name.substring(0, name.indexOf("("));

			entry = TwitterEntry.getRecentTweets(simpleName);
			if (entry != null)
			{
				for (Tweet tweet : entry.getResults())
				{
					TweetMetadata meta = tweet.getMetadata();
					if (meta.getRecent_retweets() >= influencingTweetCount && tweet.getCreated_at().compareTo(minDate) >= 0)
					{
						addTweetToMap(name, tweet);
						qualify = true;
					}
				}
			}
		}
		return qualify;
	}
	
	/** Adding tweet to list for each name **/ 
	private void addTweetToMap(String name, Tweet tweet)
	{
		if(influentialTweets.containsKey(name))
		{
			List<Tweet> tweets = influentialTweets.get(name);
			tweets.add(tweet);
		}
		else
		{
			ArrayList<Tweet> tweets = new ArrayList<Tweet>();
			tweets.add(tweet);
			influentialTweets.put(name, tweets);
		}
	}
	
	/** Gets tweet list by name **/
	public List<Tweet> getTweets(String name)
	{
		return influentialTweets.remove(name);
	}
}

Here is the implementation of my Rotten Tomatoes Qualifier. This is qualifying objects based on whether they have at least a minimum value for their critic’s or audience’s score.

package com.infinitegraph.enhanced.navigation;

import java.util.Collections;
import java.util.HashMap;
import java.util.Map;

import com.infinitegraph.Vertex;
import com.infinitegraph.demo.imdb.types.vertices.*;
import com.infinitegraph.enhanced.types.RottenMovie;
import com.infinitegraph.enhanced.types.RottenMovie.Rating;
import com.infinitegraph.enhanced.types.RottenTomatoesEntry;
import com.infinitegraph.navigation.*;

public class MinRatingQualifier implements Qualifier
{

	private int percentRating;
	private Map<String, Rating> popularMovies; 
	private int maxDegrees; 
	
	public MinRatingQualifier(int maxDegrees, int percentRating)
	{
		this.maxDegrees = maxDegrees;
		this.percentRating = percentRating;
		this.popularMovies = Collections.synchronizedMap(new HashMap<String, Rating>());
	}
	
	@Override
	public boolean qualify(Path path)
	{
		if(path.size() > maxDegrees)	return false;
		
		Vertex vertex = path.getFinalHop().getVertex();
		if(vertex instanceof Project)
		{
			Project project = (Project) vertex;
			RottenTomatoesEntry entry = RottenTomatoesEntry.get(project.getTitle());
			for(RottenMovie movie : entry.getMovies())
			{
				Rating rating = movie.getRatings();
				if(rating.getCritics_score() >= percentRating || rating.getAudience_score() >= percentRating)
				{
					popularMovies.put(project.getTitle(), rating);
					return true;
				}
			}
		}
                else if(vertex instanceof Person)   return true;   // Qualify paths for all actors (they can be a "non-popular" path to a "popular" target)

                return false;
	}
	
	/** Gets the rating by title of movie **/
	public Rating getRating(String title)
	{
		return popularMovies.remove(title);
	}
}

I used the tweet count qualifier to qualify the results of the navigation (only returns “popular” actors/actresses as endpoints) and the ratings qualifier as the path qualifier. To a certain degree of separation, I can then use this qualification strategy to find a list of all actresses and actors with high twitter activity connected to my starting point through highly rated movies. I also used the “UseJavaOnlyNavigator” navigation policy to improve the performance because I was using java code to qualify objects. Finally, I was able to successfully use the navigation result handler implementation to stop the navigator when I reached the maximum number of desired results which I set to be 100. Here is the run method of my Main.class. As you can see, the code to find my starting point and execute the navigation was extremely simple.

	@Override
	public void run()
	{
		Transaction tx = null;
		FileOutputStream fStream = null;
		try
		{
			HashMap<String,String> overrides = new HashMap<String,String>();
			overrides.put("IG.BootFilePath", BOOT_FILE_DIR);
			graphDb = GraphFactory.open(NAME, null, overrides);

			tx = graphDb.beginTransaction();
			
			// Get start node
			Query query = graphDb.createQuery(Person.class.getName(), "name==\"Hugh Jackman\"");
			Person person = (Person) query.getSingleResult();
			
			logger.info("Found Hugh Jackman: {}", person.getId());
			// Create qualifiers
			MinRatingQualifier ratingsQualifier = new MinRatingQualifier(5 /** Degrees **/, 80 /** Percent Rating **/);
			long MILLIS_IN_A_DAY = 1000 * 60 * 60 * 24;
			TweetCountQualifier tweetQualifier = new TweetCountQualifier(5 /** Min # of Tweets **/, MILLIS_IN_A_DAY /** Millis since first tweet **/);
			
			// create navigation configuration
			PolicyChain chain = new PolicyChain();
			chain.addPolicy(new UseJavaOnlyNavigator());
			
			// create result handler
			EnhancedResultHandler handler = new EnhancedResultHandler(100 /** Max # of results **/);
			fStream = new FileOutputStream("hugh_jackman.txt");
			handler.setOutputStream(fStream);
			
			// Execute navigation
			logger.info("Executing the navigation.");
			Navigator navigator = person.navigate(null, Guide.SIMPLE_BREADTH_FIRST, ratingsQualifier, tweetQualifier, chain, handler);
			navigator.start();
			
			fStream.flush();
				
			tx.commit();
		}
		catch (Exception ioex)
		{
			logger.error("Failed to complete analysis.", ioex);
		}
		finally
		{
			if(fStream != null)
				try
				{
					fStream.close();
				}
				catch (IOException e)
				{
					e.printStackTrace();
				}
			if(tx != null && !tx.isComplete())
				tx.complete();
		}

I have included the results for my navigations using these qualifier implementations starting from the following actors: Hugh Jackman and Sandra Bullock and using a minimum number of tweets to be 5 and the minimum rating of 80 percent for the qualifiers. In Hugh Jackman’s list, I found names that I didn’t recognize like Jack Kirby who was a writer for most of the new Marvel comic books who passed away in 1994 and connections that I didn’t expect like the comedian, Joel McHale, who apparently played a small role in “Spider Man 2” . Hugh turned out to be connected to all 100 people through his movie “X2”, the second film in the X-Men franchise. A surprising number of producers, writers, directors, and composers made it into these lists which makes me think that they must be highly tweeted names also. Sandra turned out to be surpisingly well-connected. Apparently, she is connected to Desmond Tutu, Michael Jackson, Mother Theresa, The Dalai Lama, and the Teenage Mutant Ninja Turtles movie all through her movie “While You Were Sleeping”.

Using simple third party libraries like the google-gson library which handles conversion from json objects to java objects and apache’s commons io library which handles network calls to rest servers, I was able to implement a powerful application with very few lines of code that gave me pretty accurate results. One significant variation that I would make on this for the future might be to force variation on the first connection by only allowing a max number of connections per movie to be qualified. This can also be forced by increasing the number of tweets. You will notice that I have methods to recover the tweets and ratings for post-processing, but I decided not to include them in the results because some were inappropriate and in time, many of them will be out of date.

Happy Trending to you!

SHARE THIS POST
Share on FacebookTweet about this on TwitterShare on Google+Share on LinkedIn