Introduction

I recently watched a TED talk by David McCandless on The Beauty of Data Visualization. It was all about finding meaning in data sets by visualizing them in creative ways. This was done mostly by aggregating scraped data from different sources on the internet and displaying them in different, interesting or useful formats. As a visual thinker, this speaks to me. I see the value in showing meaning by appealing to the visual brain to think about something because it is easier for me to comprehend it that way. Imagine the federal budget. So much money is spent each year that the dollar amount is virtually incomprehensible. Likewise, it is difficult to imagine how to ask the right questions or to avoid jumping to the wrong conclusions, but with a visual aid like Jess Bachman’s “Death & Taxes” poster, the concepts become so much easier to digest. Like the federal budget, most data sources are static and boring. This makes it superbly important to use the right visualization toolkit to show value and meaning to the consumer. Data that is connected or graphical in nature requires the use of some kind of graphical visualization tool. Since InfiniteGraph is a graph database and we offer a simple but powerful visualization tool, IG Visualizer, it made me think of different uses cases with connected data and how we could show meaning using visualization.

The InfiniteGraph Visualizer

Survey of the problem

Of course, there are many visualization tools that could be used to do some fantastic (and very cool!) visualizations. One thing that these visualization tools can do very well is give context to the meaning behind the data. For example, using a map to analyze geographic data with Exhibit or use a custom backdrop to a 2D simple bar or line graph using visual.ly. Although these visualization tools, in general, are remarkably good at dealing with static data while focusing on two or three variables (2D or 3D graphs), there are many more requirements to draw meaning from large, data rich sources. In general, here are a some limitations that many popular data visualization tools have:

  • They are limited to the type of analysis that can be done in memory because they are agnostic to where the data is coming from.
  • They cannot draw meaning very well from data sources with complex models with multiple varying fields (m x n where m is # of varying fields and n is the number of samples).
  • If the meaning of the data set draws value from the relationships that the data points have with each other and the visualization tools don’t handle the relationships, then that meaning will be lost.

These limitations are handled well by a visualization tool closely tied to the data source contained in a graph database like InfiniteGraph’s visualization tool, the IG Visualizer.

IG Visualizer

A major advantage to using the IG Visualizer for visualizations of different data sources is that it integrates closely with the features of InfiniteGraph to show views of the data that are useful, dynamic, and interesting. For example, the IG Visualizer has support for all native types of advanced and index queries, supports for custom plugin interfaces for navigation and formatting, and support for navigation configuration elements like GraphViews and PolicySets for driving navigation. Imagine configuring a highly complex navigation, executing it and exporting it to a GraphML or JSON formatted text file with just a few clicks! Of course, the IG visualizer tool has limitations of its own. For example, it is still agnostic to the context of the data but can be customized via images associated with the nodes and edges with text and font variations to give it more of a customized view. Also, it can be used to do data explorations and format the output to a file (PNG image, JSON, GraphML, or a custom format) if the desire is to analyze the results within a separate tool that does more advanced visualizations.

Meaningful Views

Imagine you are a growing company (called HookasRUs) that wants to know all of the employees that have left your company and where they have gone (could be multiple career hops) within a certain data range. From Image 1, you can see a small set of the unfiltered set which shows all employees with their old and new connections to companies that also have a connection within 5 degrees of separation to HookasRUs up to a max number of results to 10000 edges. From this picture, we can see that some employees connected to HookasRUs have come or gone to other companies such as SpringShield, TacoGrande, and Wernham Hogg.

Image 1: HookasRUs (Unfiltered View)
HookasRUs

With access to the real data (as opposed to realistic synthetic data which is what I’m using here), you can do some significant analysis this Employee Network and overlay certain restrictions on the data set revealing a whole new picture. A new feature that is rolling out with InfiniteGraph 3.1 is the ability to create and use graph views and policy sets in the IG Visualizer. The Image 2 below also shows the query results within 10 degrees using a custom navigator. As you can see, the bottom of the first degree are people who didn’t move on after they left HookasRUs (possibly due to retirement) and the top row are those that continued on after leaving HookasRUs and you can see which companies snagged them (like Ajax, Wayne Enterprises, and Dunder Mifflin).

Image 2: HookasRUs (Filtered View)
HookasRUs Employee History

Here is the code behind the custom navigator which implements the Navigator Plugin interface. Notice there are three dynamic parameters that are set using the visualizer at runtime: _workForTypeId, _workerTypeId, and _excludedPredicate. The worker type id can be set to the SalariedEmployee, ContractEmployee, or Temporary type id depending on what you want to follow. There could be more filtration done. For example, if you wanted to focus on a time period, you could show those that left HookasRUs in 2005.

public class MyCompanyNavPlugin implements NavigatorPlugin
{
	@TypeId
	public long _workForTypeId;
	
	@Parameter
	public String _excludedPredicate;
	
	@TypeId 
	public long _workerTypeId;
	
	@Override
	public Qualifier getPathQualifier()
	{
		return new Qualifier()
		{
			@Override
			public boolean qualify(Path path)
			{
				if(path.size() > 1)
				{
					Hop hop = path.getFinalHop();
					Vertex target = hop.getVertex();
					
					if(target.getTypeId() == _workForTypeId)	return true; // qualify all edges
					else if(target.getTypeId() == _workerTypeId) // only qualify same worker type
					{
						// COMPANY -HOP1-> EMPLOYEE -HOP2-> COMPANY etc.
						Vertex start = path.get(1).getVertex();
						if(start.getId() == target.getId())		return true; // only qualify same workers
					}
				}
				return false;
			}
		};
	}

	@Override
	public Qualifier getResultQualifier()
	{
		return new Qualifier()
		{
			@Override
			public boolean qualify(Path path)
			{
				if(path.size() > 1)
				{
					Hop hop = path.getFinalHop();
					Vertex target = hop.getVertex();
					
					if(target.getTypeId() == _workerTypeId) // only qualify same worker type
					{
						// COMPANY -HOP1-> EMPLOYEE -HOP2-> COMPANY etc.
						Vertex start = path.get(1).getVertex();
						if(start.getId() != target.getId())		return false; // only qualify same workers
					}
					else	return true;
				}
				return false;
			}
		};
	}

	@Override
	public Guide getGuide()
	{
		return Guide.SIMPLE_BREADTH_FIRST;
	}

	@Override
	public GraphView getGraphView()
	{
		GraphView view = new GraphView();
		view.excludeClass(_workForTypeId, _excludedPredicate);
		return view;
	}

	@Override
	public PolicyChain getPolicies()
	{
		PolicyChain chain = new PolicyChain();
		chain.addPolicy(new MaximumPathDepthPolicy(10));
		return chain;
	}
}

Visualizing the Dead

Another interesting view would be looking at the famous songs played by the Grateful Dead. Here is a view of the data set. As you can see it may not be very readable.

Image 3: Grateful Dead (Unfiltered View)
Grateful Dead Performances

Instead a more interesting view might be to just look at songs written by the Grateful Dead. From this graph, you may notice the connection from Grateful Dead which rewrote a traditional song and a song written by an unknown artist. These two songs connects the group of songs written by the Grateful Dead alone to these groups associated with Traditional and Unknown artists. These three groups are easy to identify using the Spring Layout Algorithm.

Image 4: Songs Written By the Grateful Dead (Filtered View)
Grateful Dead Songs

As you can see, I included the Graph View that excludes the SungBy and FollowedBy edges in the right hand column above. This graph view can be created within the visualizer, and by simply dragging it onto the graph, you can get a filtered view of the data.

Another interesting visualization comes from just looking at songs sung by the Grateful Dead. From this visualization, we can see that again three groups emerge. The central group are songs sung by the whole Grateful Dead band. The bottom right group are the songs sung by Garcia.

Image 4: Songs Sung By the Grateful Dead (Filtered View)
Grateful Dead Songs

It is interesting to see that there is only one song that these two groups share and the set of songs sung by Garcia is so much larger. This makes complete sense because Jerry Garcia was the frontman and many of the most famous Grateful Dead songs are crooned by him. It is like comparing the number of songs sung by Lennon and McCartney to the songs sung by Starr and Harrison of the Beatles. There is no comparison! My apologies to fans of “Octopus’s Garden” :).

Many Views of Leo Dicaprio

What do you know about the career of Leonardo DiCaprio? Did you know that he starred in the last two seasons of Growing Pains and was in one episode of Roseanne? Did you know that Leonardo Dicaprio has his own production company? It is called Appian Way Productions. He is a very busy man. Here is a unfiltered visualization of Leo’s connections up to 3 degrees and 10000 edges.

Image 5: World of Leonardo Dicaprio (Unfiltered View)
Unfiltered view of Leonardo DiCaprio

This view is very difficult to make sense of when you have no context. You can see that I have used a non-standard background color for each vertex type to give the graph more context. Below is the breakdown.

Type Background Color
Movie Green
Person Purple
TvShow Pink
Distributors White

If you want, you can also give the vertex types icons, so they are even more identifiable. As you can see, this graph is a bit too crowded with TV shows and movies made for video. When you limit the number of results and you don’t filter, then you are vulnerable to get all the useless results up front. As we can see here, many tv shows dominate the graph which may not be what we are looking for. Since Leonardo is primarily a movie actor, most of these tv shows are unimportant. Therefore, I created a Graph View through the visualizer that restrains the amount of work and the types of results that are returned. This IMDB data set includes details for all people including the sex of each actor and details for each project including things like the year it was made, the gross earnings, and whether it was made for tv.

Image 6: Graph View For “Women in Recent Popular Movies”
Women In Popular Movies Graph View

The graph view, above, filters out all paths that include actors that are men, all distribution companies, all tv shows, and all movies that were made before 2000, have a gross earnings total of less than 10M and that are made for video. This gives us the much smaller graph image that we see below that is easier to enjoy because we can identify most of projects that he is associated with.

Image 7: World of Leonardo DiCaprio (Filtered View)
Filtered View of Leonardo DiCaprio

Thanks to IMDB for letting us use their data. If you have any more questions or want more information regarding InfiniteGraph, feel free to visit our website for more information or to contact us. I hope you enjoy using the IG Visualizer tool to dynamically visualize your graphs and the new features associated with 3.1! Happy Trails!

SHARE THIS POST
Share on FacebookTweet about this on TwitterShare on Google+Share on LinkedIn