WHITE PAPER - The Rationale Behind Metadata ConnectThis White Paper looks at the factors that drove the features and production of Metadata Connect.
This White Paper looks at the factors that drove the features and production of Metadata Connect. It looks at categories of products and systems built over the years with Objectivity products, some distinguishing characteristics of metadata and how they are handled in Metadata Connect.
Many organizations have used Objectivity/DB over the years to build systems that were mainly involved with defining and maintaining metadata, even though they weren’t necessarily presented that way. Examples include:
- Electronic and mechanical design software.
- Product configuration management applications.
- Avionics cockpit configuration management.
- Manufacturing, telecom and power utility equipment configuration management.
- Smart electricity meter, medical and scientific data streams.
- Enterprise, multimedia and intelligence collaboration software.
- Scientific experiment configuration management and star catalogs.
- Organizational structures that deal with users, teams, organizations, companies, roles, projects, matrix management, etc.
Appendix A provides further details about the kinds of applications involved.
Many kinds of metadata may appear simple but become overwhelming as more information about a problem becomes available. One Objectivity customer was using a popular Relational Database (RDBMS) to collect and analyze smart electricity meter readings. The core of the data included a meter identifier, a time and a meter reading, so we were puzzled as to why they couldn’t just store it in a single table. However, the data came directly from the equipment and there were already several hundred different variants by the time that the development team found us. The number of tables involved was getting out of hand. We quickly reduced it to a simple object model, with a base object type, then variants for each technology, manufacturer, instrument range and so on. They had a proof of concept up and running on Objectivity/DB in just a few days.
Their software had previously needed to access multiple tables to insert information about a new meter and corresponding tables to hold the data that flowed from the meters. Although queries across tables could be masked by “View tables”, updates were tedious and prone to error. As many as six or seven tables might need to be updated to insert a single reading. Both performance and code complexity were a problem until they switched technologies.
Objectivity/DB stored each object in its native form but remembered the hierarchy of types in the description of a particular class, so, in the hierarchy below, storing a Digital Interval meter might produce an item of type 23, but it could also be found by asking for Interval meters (type 20) or just Electricity Meters (Type = 1). If the user asked for Electricity Meters Objectivity/DB would do a multi-key search for objects of types 1, 20, 23 and any other variants (not shown).
In short, inheriting type information helps simplify the problem of dealing with multiple metadata variants. Metadata Connect makes it easy to derive new information types from existing ones. Storage is optimized as there is only one physical database object, rather than lots of interconnected tables.
Handling Complex Structures
It quickly became apparent that another kind of data that didn’t fit well into existing databases involved highly interconnected structures. As a very simple example, imagine storing a document in a relational database. In the purest representation there would be tables for Document, Section, Chapter, Paragraph, Sentence, Word and even Character. Have you ever wondered why there are no word processing applications that store data in a relational database that way? The storage and processing overheads would be huge. Retrieving a document would involve opening at least six tables plus the accompanying indices and join tables.
Objectivity/DB allocates every object a unique identifier (Object Identifier, or OID). It then represents links between an object and other objects as a varying length array of OIDs. The links may all be of the same type (“Customer-to-Product”) or of different types. The type is stored along with the OID in the latter case.
Sometimes it is necessary to perform operations on every object in a structure, e.g. to delete a Document and all of its accompanying components. Objectivity/DB can propagate actions such as Delete and Lock along the links, ensuring that the user has permission to perform the operation before performing it.
Objectivity/DB has another mechanism, called Collections, that handles the compact representation of lists, trees, groups and other commonly used structures. All of these object linking mechanics are, or will soon be, available in Metadata Connect.
Navigation And Pathfinding
Standard queries look for instances of objects that match specified conditions, such as - Find all Locations where ZIP_Code=“94571”. Metadata Connect can exploit the known connections between objects to perform queries such as - “Find all of the components of this piece of equipment”. Some queries use typed links and others may use any kind of link. This kind of query is called “navigational”, because the data is navigated via the links. A single Metadata Connect query can find all of the assemblies, sub-assemblies and components for the product depicted below.
It is also useful to be able to find the shortest or all paths between objects, such as - Find all links between Journalist=“Krem Brûlé” and Article=“SCOOP” [see below].
This kind of query is called “pathfinding”. Metadata Connect can perform pathfinding queries out to any degree of separation in a very short time because it is supported by an efficient storage and navigational platform and has some unique algorithms to speed up complex queries.
Huge Volumes Of Data
Most databases only have a few records connected with a particular physical or notional object, such as a component, or a geotag. However, in many applications the amount of metadata that can be associated with a single object can be huge, often several of magnitudes more than the original object. Such is the case in telecom equipment and in intelligence gathering systems. A single reconnaissance photo may be processed in many different ways and be tagged with hundreds of labels to make it more useful to analysts.
Besides the storage costs involved, accessing more data requires more physical I/Os, compute power and time. There are many ways to reduce these factors, including compression, clustering, indexing, linking and so on. The database inside of Metadata Connect employs a range of mechanisms to reduce the number of I/Os required to retrieve associated groups of information. In doing so it automatically reduces the effort and cost involved. It also stores and retrieves data in parallel, further reducing the time to perform complex operations, such as pathfinding.
QUERYING THE METADATA
Metadata Connect has an advanced declarative query language called DO. It can perform conventional (scan-like), navigation and pathfinding queries, optionally in parallel. It probably won’t outperform other databases at conventional queries, unless there are a lot of variants of a particular kind of data, but it will almost always outperform them when navigation or pathfinding are involved.
As an example, imagine that a person wants to ship a very large item from the West to East Coast. The item is too large to go by air or rail, so the shipper might perform the following search: Find the cheapest path between San Francisco for a 60 ton item that is 40 feet long, 12 feet high and 15 feet wide. It must not go to cities North of Chicago nor South of Oklahoma City. It must not involve more than two transitions between road and waterways.
Although that is the intent of the user the query needs to be more specific, using the fact that links can be of type “Road” or “Waterway”, they must be at least wide enough and bridges and other obstacles must be high enough to accommodate the object. The cities along the path must have latitudes between 35.47N and 41.87N and so on. The DO query language is powerful enough to handle this request in a single query, executed in parallel.
We have shown that Metadata Connect can be used in a wide variety of applications. It has been built to handle the wide variety, complexity and volume of metadata that can be generated from even the simplest of processes. The DO declarative regular and graph query language can handle searches, navigation and pathfinding. All of this is available in an easy to use online cloud application that is provided on a Software as a Service basis on the Microsoft Azure Market Place.
APPENDIX A - Systems And Products That Manage(d) Metadata
All of the following organizations have built and deployed systems or products that use Objectivity’s advanced database technology:
- Adra Matrix: A configuration management system for advanced mechanical and other systems.
- Emerson DeltaV: Configuration management for process control plants.
- Osmose Smartmaps: A system for cataloging utility resources and displaying them on maps.
- American Meter: Smart electricity metering that involved hundreds of different types of meter.
- Intraspect: Pioneered the category of enterprise collaboration software that combines the best of on-line collaboration and knowledge management.
- Daikin (later Mentor Graphics) Scenarist: A DVD authoring tool that included a digital asset manager.
- Rockwell Collins: Ascend: Avionics cockpit design, including advanced configuration management.
- SEH America and NEC: Manufacturing systems that depend upon accurate plant configurations and equipment details.
- CERN: Configuration management for LHC detector crystal manufacturing, calibration and deployment.
Space Telescope Science Institute: Reference star catalog.
- Qualcomm, Nortel, Ericsson, Marconi, Digital Switch Corporation and Ciena Networks all built Element Management Systems that catalog the components within telecom network equipment.
- Aptix, British Telecom, Cray Research, Cadence (Valid Logic), International Computers Limited, Technology Answers (Cimplex) and Newport News Shipbuilding (now NGC) built engineering design systems that contain configuration management components.