Introduction

InfiniteGraph 3.0 let’s you configure placement of your data in a completely unique way. In particular, you can configure placement of your distributed data by type and by related type. Consider the use case where an insurance company wants to place all instances of hospital records in a main storage location, and then store all doctor and patient records with their corresponding hospital. Also, consider the use case where an international investment company wants to store all trade data in the NYSE (New York Stock Exchange) in a remote data center in New York, and trade data for the TSE (Tokyo Stock Exchange) in a remote data center in Tokyo, Japan.

With a distributed database, data localization is the process of storing data where it most makes sense, which is usually local to the applications that are accessing it most frequently. A major reason why you might like to localize your data is because it reduces the “distance” that you would need to go retrieve the data. Some other reasons for creating and implementing a custom placement model in InfiniteGraph would be:

  1. To improve query performance (we place it, we find it),
  2. To improve indexing performance,
  3. To potentially improve the read/write performance due to reduced lock contention,
  4. To maintain and organize data in a logical manner.

 

Survey of the Problem

Managed data localization has become important with the onset of distributed databases because data is not of value unless it can be accessed. Terms like sharding, partitioning and custom placement have become part of our vocabulary. Without the ability to manage placement strategies at an administrative level, the problems are very difficult to manage. A placement model is closely tied to the data model, and you should be able to evolve the placement model over time as the data model evolves.

Let’s look at an example in which we want to place related objects together, and isolate highly trafficked objects so that they don’t cause other nearby objects to be locked. This is now all possible with InfiniteGraph!

Describing the Model

We are using the IMDB data set so all of our actors/actresses are of type “Person” and all movies, tv shows, etc. are “Projects”. We also may have details for each of these types:

  • “PersonDetails” which include things like height, birth date, other work, death date, trivia, etc. and
  • “ProjectDetails” which include things like year, language, running time, country, rating, plot description, and release date.

Each person or project object has exactly zero or one corresponding details object.

Also, the edges may be simple “ActedIn” or a generic “WorkedOn”, which may include a job description like “Producer”, “Director” or other. We recognize that since the projects have a number of connected edge objects, they will be highly trafficked. Also, we can see that “Person” and “PersonDetails” are related and “Project” and “ProjectDetails” are related.

Designing the Solution

Because InfiniteGraph uses schema to represent a data model, we can also use schema to create a placement model. The placement model is a logical view of the placement data, considering its likely relationship to other data in the set AND probable access patterns that may produce bottlenecks due to lock conflicts. The placement model is defined in an XML placement model document (PMD) that allows the administrator to define the placement scope at three levels: Object, Container, and Database. Object Placer scopes are used to place objects in containers, container placer scopes place containers in databases, and database placer scopes place databases in storage locations. You can customize the placement of the data at any level using rules also defined in the PMD. The rules associate object types with object placers. For more information, visit the Placement of Persistent Element page on the InfiniteGraph Developer Site.

Since we want to place related objects close to each other, we should place the objects: “PersonDetails” and “ProjectDetails” near the corresponding “Person” and “Project” objects. Also, since “Project” objects are highly trafficked, we want to isolate these in their own containers.

Custom Placement Model

At the model level, this usually means that the “PersonDetails” and “ProjectDetails” types would share a placer with the corresponding “Person” and “Project” types, respectively.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
<ContainerPlacers>
    <ContainerPlacer name="PersonGroup" description="Placer used for placing person containers" placeInto="OwnScope" databasePlacer="default">
      <Scope>
        <SingleDatabaseGroup/>
        ...
      </Scope>
    </ContainerPlacer>
    <ContainerPlacer name="ProjectGroup" description="Placer used for placing project containers" placeInto="OwnScope" databasePlacer="default">
      <Scope>
        <SingleDatabaseGroup/>
        ...
      </Scope>
    </ContainerPlacer>
</ContainerPlacers>
<ObjectPlacers>
    <ObjectPlacer name="Person" description="Placer for placing person objects" placeInto="OwnScope" placeOnNewPage="false" containerPlacer="PersonGroup">
      <Scope>
        <SingleContainerGroup/>
        ...
      </Scope>
    </ObjectPlacer>
    <ObjectPlacer name="PersonRelated" description="Placer for placing person objects" placeInto="RelatedObjectScope"/>
    <ObjectPlacer name="Project" description="Placer for placing project objects" placeInto="OwnScope" placeOnNewPage="false" containerPlacer="ProjectGroup">
      <Scope>
        <ContainerGroupPerObject/>
        ...
      </Scope>
    </ObjectPlacer>
    <ObjectPlacer name="ProjectRelated" description="Placer for placing person objects" placeInto="RelatedObjectScope"/>
</ObjectPlacers>
<Rules>
     <Rule objectClass="com.infinitegraph.imdb.types.Person" objectPlacer="Person"/>
     <Rule objectClass="com.infinitegraph.imdb.types.Project" objectPlacer="Project"/>
     <Rule objectClass="com.infinitegraph.imdb.types.PersonDetail" objectPlacer="PersonRelated">
          <PlacementRelationship relatedObjectClass="com.infinitegraph.imdb.types.Person"/>
     </Rule>
     <Rule objectClass="com.infinitegraph.imdb.types.ProjectDetail" objectPlacer="ProjectRelated">
          <PlacementRelationship relatedObjectClass="com.infinitegraph.imdb.types.Project"/>
     </Rule>
</Rules>

Notice that the “Person” and “Project” object placers have different scopes. The “Person” object placer has a “SingleContainerGroup” scope, which means that all “Person” objects will be placed inside a container. The “Project” object placer has a “ContainerPerObject” scope, which means that only one “Project” object will be placed within a container. Since locking is done at the container level, the “ContainerPerObject” scope can be used to isolate highly trafficked objects, thereby reducing lock conflicts that would occur with a “SingleContainerGroup” scope. Note: This model does not have to be the final definition of the placement model for your data. The initial placement model can be defined through the property “IG.Placement.PmdFilePath” in the properties file. The placement model can also be added to and updated using the igimportplacement tool. If the placement model is updated, existing data will not be moved or altered, but any new data will be placed according to the newly defined placement rules.

Persist by Related Object

At the code level, you would need to identify a target related object by which to store a given object to trigger the placement rule. There are two ways to do this.

Method 1: Derive from Element Data

If the two object types “PersonDetails” and “ProjectDetails” were derived from ElementData and were referenced in the “Person” and “Project” classes then adding them to the graph with their corresponding “Person” or “Project” would happen by default when the “Person” or “Project” is added to the graph. This default behavior is called Persistence by Reachability.

1
2
3
4
public class PersonDetails extends ElementData
{
       ...
}
1
2
3
4
5
6
7
8
public class Person extends BaseVertex
{
    // Name
    private String name;
    // Person Details
    private PersonDetails details;
       ...
}
Method 2: Using PlacementConditionsPolicy

If the two object types (“PersonDetails” or “ProjectDetails”) extend from BaseVertex or BaseEdge and are not referenced inside their corresponding “Person” or “Project” (as shown above), then they would have to be placed using a policy on the transaction that indicates special placement. When adding the “PersonDetails” object to the graph database instance, you need to construct a PlacementCoditionsPolicy with the related “Person” object set on the transaction. Here is an example that places PersonDetails for “Tom Cruise” by the related Person object that represents “Tom Cruise”.

1
2
3
4
5
6
7
8
9
10
Transaction tx = graphDb.beginTransaction(AccessMode.READ_WRITE);
// Add Person : Tom Cruise to graph
Person cruise = new Person("Tom Cruise");
graphDb.addVertex(cruise);
// Set Person as related on the transaction
tx.setDataPolicies(new PolicyChain(new PlacementConditionsPolicy(cruise)));
// Add PersonDetails to graph
PersonDetails cruiseDetails = new PersonDetails(....);
graphDb.addVertex(cruiseDetails);
tx.commit();

The placement of your data will now be customized and you can see the name of your database change to “PersonGroup” and “ProjectGroup”.

 

Image: Custom Placement
 

 

Custom Placement
 

Adding Storage Location(s)

If you want to distribute the data across multiple storage locations/zones, you need to modify the container placers to use a non-default database placer and add a new storage location using an administrative tool. First, update the above placement model document to include a database placer for the “Person” (PersonStorage) and for “Project” (ProjectStorage).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
<DatabasePlacers>
    <DatabasePlacer name="PersonStorage" description="Placer used for placing person databases" placeInto="OwnScope">
      <Scope>
        <SingleStorageGroup/>
        ...
      </Scope>
    </DatabasePlacer>
    <DatabasePlacer name="ProjectStorage" description="Placer used for placing project databases" placeInto="OwnScope">
      <Scope>
        <SingleStorageGroup/>
        ...
      </Scope>
    </DatabasePlacer>
</DatabasePlacers>

This will define the behavior for placing databases in a unique scope. Next, you must define the physical locations where the database placers can store databases. This can be a single location or multiple locations across multiple hosts. You can do this use the objy addstoragelocation tool. Simply, open a command shell and type in: objy addstoragelocation -help. You should see something like this:

 


Objectivity/DB (TM) Add Storage Location, Version: 11.1
Copyright (c) Objectivity, Inc 2012, 2013. All rights reserved.

AddStorageLocation
    [{-name name}]
    [-description description]
    [{-storageLocation locationIdentifier}]
    [{-zone zoneName}]
    [{-dbPlacerGroup groupDesignation}]
    [-noTitle] [-help] [-quiet]   -bootFile bootfile

Adds a storage location to the federated database's main storage group
(MSG), and, optionally, to one or more storage zones or database-placer
groups.

-name name            Name by which the item can be referenced.
-description          Description of the item for your own recordkeeping.
   description
-storageLocation      Storage location in the format host::path. Omitting
   locationIdentifier host:: causes the local host to be used, or the host
                      implied by an NFS mount name.
-zone zoneName        Storage zone to which to add the storage location. If
                      the MSG does not already have a zone with the specified
                      name, the zone is created.
-dbPlacerGroup        Database-placer group to which to add the storage
   groupDesignation   location. The format is
                      dbPlacerName[::partitionNumber][::groupNumber] -- for
                      example, accounts::3 or observations::2::2. A definition
                      for dbPlacerName must already exist in the federated
                      database's placement model.
-noTitle              Suppresses the program title banner.
-help                 Prints the tool syntax and definition to the screen.
-quiet                Suppresses all normal program output.
-bootFile bootfile    Path to the boot file of the federated database.

You can use the objy addstoragelocation tool to add a storage location for each of the database placer groups: “PersonStorage” and “ProjectStorage”. For example:

 objy addstoragelocation -name PersonLocation -storagelocation test123::C:\data\Person\ -dbPlacerGroup PersonStorage -bootfile IMDB.boot
This will place all objects of type "Person" (and the related "PersonDetails") in the directory "C:\data\Person" on the host test123. In addition to defining your own storage locations, you can also indicate a preferred storage location. To do so, create a configuration file and use the IG.Placement.PreferenceRankFile property in the properties file to point to the file.

 

Summary

Using a more complex custom placement model and the objy addstoragelocation tool, you can develop a very sophisticated and complex placement strategy directly based on your data model. This can be a very powerful tool to get better performance from application read/writes and to improve data access and organization. To learn about all the options for writing a placement model document and for using the placement related tools, visit the Customizing Storage page on the InfiniteGraph Developer Site.

For more information about InfiniteGraph, feel free to also visit our website or contact Objectivity support at support@objectivity.com. Happy Trails!

SHARE THIS POST
Share on FacebookTweet about this on TwitterShare on Google+Share on LinkedIn