Welcome to Objectivity, Inc. -- makers of the industry leading Objectivity/DB object-oriented database management platform, Grid Certified (Levels 1 through 6), and SOA compliant We are the leader in scalable database management solutions for mission-critical, real-time and distributed applications.

Object Oriented Database Learning Center:


 

Object Oriented Database Learning Center

Object Oriented Databases Hardware Architecture Design and Sizing

Hardware Architecture Design and Sizing: We can now start to scope the hardware requirements to build such a system. If we average 10,000 messages per second and we need to store 7 days worth (604800 seconds), we’ll have an average of about 6 billion messages in storage. If the average size of a message is 5Kbytes, we’ll need at least 30 Terabytes of disk space to store it.

Let’s assume a 50% “data explosion” rate, which represents the overhead for Objectivity/DB , including any indexes or other extra metadata. This will be analyzed in more detail, but is a fairly safe estimate. Let’s also assume that the total size and quantity of messages may fluctuate as much as 25%. We’ll add another 25% for any other factors we may not have considered as well, so we’ll need a total storage area of 60 Terabytes.

The storage hardware must also be able to sustain writing at the peak rate of 50,000 messages, or 250 MB, per second and be able to support simultaneous reading of the data at a similar rate. Here again, the ability to delete data by deleting files at a time is very important, since it will require very little I/O.

A very small benchmark was performed to determine the relationships of object size and page size against overall ingest throughput on an Objectivity/DB system.

To perform the benchmark, a system with the following specifications was used:

  • 1.2 GHz Athlon processor
  • Windows XP Professional
  • Objectivity/DB /C++ 8.0
  • C: Drive - 60GB Quantum FireballP 7200 RPM (ATA)
  • F: Drive - 80GB Maxtor DiamondMax Plus 7200 RPM (ATA)

The objects created were instances of a class called Message with the following members:

uint64 mReceived;
uint64 mSent;
ooVString mBody;
ooRef(Format) mFormat;
ooRef(Address) mTo[] : copy(drop);
ooRef(Address) mCC[] : copy(drop);
ooRef(Attachment) mAttachments[] : copy(drop);

The objects were created in the database in single transactions. The size of the object was varied by assigning different sized strings to the mBody member variable. The total size of the object was estimated by adding 16 bytes to the size of the mBody string for the mReceived and mSent variables. There was also overhead for the reference and associations that was not counted.

The number of objects created in each transaction varied according to the object size.

The calculation used was:

number of objects created = 50,000,000/size of data string in object.

The actual range of number of objects created per transaction was 1,000,000 to 59.

The first object of each transaction was clustered using the container OID. All following objects were clustered using the previously created object. Each object’s mBody variable was set by passing a pointer to a pre-defined string using a member function call. The member function called ooUpdate() each time, so some optimization might be gained for smaller objects by eliminating this redundancy.

For each run a new database and container were created and the container was pre-sized with 20,000 pages and a 50% growth rate. The creation of the database and container were not included in the timings.

1

Just from observation of the Windows’ performance monitor, the CPU seemed to be running approximately 33% to 50% busy during object creation. The lock server was sharing the CPU, but could not have been very busy, since there was only a single container being used.

Multi-threading or running parallel processes with both disks would almost certainly increase throughput within the same configuration.

Given parallel input streams, one can reasonably assume that many such systems can be used in parallel to achieve very large overall ingest rates. For example, let’s assume that we can sustain 10MB/sec. Using 30 such systems in parallel should provide an overall throughput of approximately 300 MB/sec or over a Terabyte per hour (300 * 3600 sec/hour). If such systems cost $1000 each, the total cost would be $30,000.

The most expensive part of this system is the physical disk storage. Based on suggested retail prices published on websites as of February 7, 2003 (from http://www.apple.com/xserve/raid/ on November 11, 2003), these are some prices for high-speed RAID storage:

1

This implies that providing the storage capacity for a 60TB system could cost between $264,000 and $3,300,000. Clearly, the cost of hardware to support the database (beyond the basic disk storage) is a small portion of the total cost of the system.

Each of these systems, however, supports much more I/O bandwidth than the single disk drives used here.

To scale it further, the first likely Objectivity/DB bottlenecks would be the lock server and the journal file access. Based on larger benchmark experiments, the lock server shouldn’t become a bottleneck until scaled at least by a factor of 80 or so if it’s running on a reasonably fast CPU and network connection.

The journal file directory/disk would be a more serious problem, but fortunately Objectivity/DB ’s journal files are very small, since they only have to store the old page maps (the old pages aren’t freed until the end of the commit process). These performance measurements also assume fairly long transactions, so the amount of communication with the journal files will be very small.


Object Oriented Database Learning Center