A standard for database objects lets disparate data-manipulation tools exchange information.
Stephen R. Gardner
Metadata is popularly defined as "data about data." For the IT manager, this is more concisely stated as "information about the enterprise." In the context of data warehousing, the term refers to anything that defines a data-warehouse object, such as a table, query, report, business rule, or transformation a
lgorithm. Metadata management gives users greater control of corporate data by providing a map of the locations where that data is stored. It also supplies a blueprint that shows how one type of information is derived from another.
The current crop of data manipulation and management tools has resulted in IT products that all process metadata differently, with little consideration for shari
ng the information. This situation highlights the need for a metadata standard.
Efforts are under way by a consortium of companies to standardize metadata interchange among products from diverse vendors. Six companies -- Arbor Software, Business Objects, Cognos, Evolutionary Technologies International, Platinum Technology, and Texas Instruments Software -- formed the Metadata Council in July 1995.
The Council launched the Metadata Interchange Specification (MDIS) to address issues relating to the exchange, sharing, and management of metadata. The Council released versio
n 1.0 of MDIS in June 1996. The standards document can be downloaded from
http://www.he.net/~metadata/standards/toc.html
.
Initial Steps
MDIS consists of components that represent the minimum common set of metadata elements and the minimum integration points that must be incorporated into database tools for compliance. MDIS also provides standards for optional and extension components that are relevant only to a particular class of tool.
A common language must be developed before a standard can be constructed. This involves setting up well-understood and well-communicated processes for naming metadata elements, standardizing data types and lengths, and maintaining descriptive glossaries.
This development of a common definition and terminology involves two entirely diff
erent information models. First, there's the
application metamodel
. This model is application-specific and describes the tables and objects that contain the metadata for schemata particular to a given application. Second is the
metadata metamodel
, which is the set of objects that MDIS describes. These objects reflect information common to one or more classes of tools, such as database servers and data-discovery and data-extraction tools. For MDIS to succeed, the metadata metamodel must be independent of any application metamodel. It must have a unique definition for each object, and it should be character-based so that it's platform independent.
Since metadata is stored in different types of storage or data formats -- such as relational tables, ASCII files, and customized repositories -- the MDIS access methodology must be very flexible. This requires a framework that translates a tool's metamodel access request to match the MDIS syntax and format, as shown
in the figure
.
To establish a bidirectional data flow, the standard uses three types of information. First, the metadata files include a header with version information. Second, a Tool Profile file contains character-based information that describes the type of metadata elements that the tool manipulates. Finally, a character-based Configuration Profile file describes the mapping of data to specific metadata objects. It also describes what flows of the metadata are legitimate: A tool might be prohibited from using a later version of a metadata object because of major changes to source-to-target mappings of the metadata.
MDIS uses a text-based tag language that resembles HTML. The mechanism that implements extensibility to the MDIS is similar to Lisp's properties object, a character field of arbitrary length that's composed of identifiers and a value. The tool's import function uses the identifier to recognize the metadata type and to locate the data within in the field. The value is the metadata itself.
In Search of a Standard
The Metadata Council examined several ways to implement the standardized MDIS model, as shown
in the figure
. The
ASCII batch approach
relies on an ASCII file format. The file contains descriptions of the common metadata components and standardized access requirements that make up the MDIS model. The file loads whenever a tool accesses metadata via the common API.
This approach does not require updating the tool when the metadata model changes; modifications to the standard are made to the file instead. However, since using an object requires loading the entire MDIS framework, this approach is processor intensive.
The
procedural approach
requires that the intelligence to communicate with the MDIS standard be built into the tool. This approach needs only a modification to the API to accommodate changes and additions to the metamodel schema and/or access parameters. But it also requires a great deal of up-front effort on the pa
rt of tool vendors to retrofit this logic to achieve MDIS compliance.
The
hybrid approach
combines the ASCII-batch and procedural approaches. It follows a data-driven model. A tool loads a set of tables that define the MDIS API. The tool interacts with the API through the MDIS framework and retrieves just the needed object. This eliminates the need for reading the entire schema.
Changes to the standard are reflected in the table data so that the tools don't have to be modified to maintain compliance with the MDIS specification. But loading the tables can be time consuming, which is unacceptable in information-intensive applications.
A fourth approach is to develop the MDIS standard within the Electronics Industries Association's CASE (Computer Aided Software and Systems Engineering) Data Interchange Format (CDIF). The CDIF standards support multiple semantic layers and transfer formats for CASE tools. Adopting this approach carries two obligations: the Metadata Coalition must appoint
For version 1.0 of the MDIS, the Council recommends the ASCII batch approach because vendors can implement support for the specification with minimum overhead and a shorter time to market.
Into the Future
There will continue to be a lack of integration among metadata tools for the next several years. Integrated metadata won't be readily available until at least the 1998/1999 time frame, when repository-based solutions should begin to emerge.
Furthermore, integrating new objects that consist of video, audio, and spatial data types will offer some additional challenges to the Metadata Council -- as well as anyone who's looking for integration of metadata tools.
illustrati
on_link (16 Kbytes)

Profile information, an API, and a standard set of objects allow diverse metadata tools to interoperate.
illustration_link (39 Kbytes)

The Metadata Council chose the ASCII batch approach because it's easy to implement and offers a shorter time to market.
Stephen R. Gardner (Seattle, WA) is the director of advanced technology
research at NCR Corp. You can reach him by sending e-mail to
stephen.gardner@sanfranciscoca.ncr.co
.