tabase design information, business rules, and corporate naming standards, for example. In a sense, a repository's role is similar to that of a library's card catalog -- an exhaustive and cross-indexed list of resources.
Chances are that most programmers will recognize the notion of a repository as the library or component manager associated with many of today's developer tools. Imagine that kind of library for an entire organization's resources. That's the vision of the repository, so it shouldn't be surprising that repositor
ies have been called data dictionaries -- even encyclopedias -- and that their contents are often referred to collectively as metadata, or data about data.
Data repositories aren't new. For example, they have often been associated with CASE and data-modeling tools. CASE repositories have focused on storing design information, often about database schemata. Some repositories, usually from tool vendors, have been designed to store information related to the software-development process: source code, version history, project management information, and so on. But the need to share information across enterprises and government entities has led to a variety of domain-specific proposals for metadata repositories, including Federal Geographic Data Committee (FGDC) for geographic information systems, the Warwick Framework and Dublin Core for digital libraries, and industry standards such as Common Data Interchange Format (CDIF), a standard devised by CASE-tool vendors for modeling tools.
Repositories, then, a
re tools to help manage computer systems and networks. Metadata is extensively used in systems and applications to gain efficiency when accessing, transferring, sharing, or processing large amounts of data.
An ideal repository will be distributed, open, and extensible. It will also be largely self-managing and will interoperate with metadata sets coming from different sources and represented using different standards. It will let itself be interrogated through open, standard, and well-defined interfaces.
Repositories are back in the public eye. This has happened largely because of their role in the exploding field of data warehousing and on-line analytical processing (OLAP) applications. Repository technology makes sense in data warehousing, because you need to store information about a data warehouse's (or OLAP server's) source data and about the extraction, cleansing, and aggregation rules that are associated with building and maintaining it.
Historically, data-warehousing-tool vendors have crea
ted proprietary database applications to store and manage that data. Interestingly, some of these vendors make it easy to let traditional information-worker end users "browse" the repository data; others see IT staffers as their ultimate end users.
The Metadata Council was formed in July 1995 in an effort to help bridge the gap among proprietary stores of metadata, and the Metadata Interchange Specification (MDIS) is the result. Now in version 1.1, MDIS promises to be a valuable basis for interoperability. (See "The Quest to Standardize Metadata" by Stephen R. Gardner in the November 1997 BYTE for more on MDIS.)
Microsoft's Bottom-Up Strategy
Data warehousing isn't the only reason for the renewed interest in repositories. Another reason is Microsoft and its forthcoming Microsoft Repository 2.0. Unless you're a die-hard Visual Basic programmer, you probably don't even know that Microsoft shipped the first version of the Microsoft Repository in March 1997 as a Visual Basic add-in.
Althoug
h thousands of programmers have reportedly downloaded it from the Microsoft site, it hasn't set the world on fire. In fact, more than one programmer has complained that, not only did they have trouble installing the Microsoft Repository, they couldn't figure out what they were supposed to do with it.
That's been the problem with most repositories. They tend to be a hassle to set up and maintain, and, from a programmer's perspective, there's no perceived added value.
Some of you will remember IBM's AD/Cycle, a grandiose, but unsuccessful, attempt to centralize the management of mainframe application development. IBM's repository initiatives date back to the late 1980s, and its first host-based repository shipped in 1990 as part of AD/Cycle. Since then, IBM has switched to the client/server model, and its repository technology has evolved through Configuration Management Version Control (CMVC), from 1991, to the current VisualAge TeamConnection, which has been available since 1995.
The simple fact t
hat IBM's top-down initiative was so far ahead of its time was undoubtedly the main reason AD/Cycle failed. However, it, too, was widely viewed by programmers as unnecessary overhead with no payoff. To be fair, Microsoft admitted that Microsoft Repository 1.0 was mainly for independent software vendors (ISVs), and not programmers.
The Microsoft Repository dates back to May 1994, when Microsoft and Texas Instruments announced collaboration on the design of an object-oriented repository that would store OLE components. (TI, with its Information Engineering Facility [IEF]/Composer product, was then a major CASE-tool vendor. TI's software division has subsequently been sold to Sterling Software, and Composer is now part of Sterling's Cool family of products.) TI may be out of the picture now, but the Microsoft Repository remains an ActiveX/COM-based (Component Object Model) vision.
Last summer, Microsoft and Platinum Technology announced an alliance whereby Platinum received the rights to port the Microso
ft Repository to non-Windows platforms and to databases other than SQL Server for Windows NT -- efforts that are both expected to bear fruit later this year. Platinum itself is a major high-end repository player (its prices start at $150,000), selling both Platinum Repository/MVS and Platinum Repository/OEE.
At the same time, Microsoft announced its Open Information Model (OIM), an extensible
COM-based
object model that defines the structure of objects shared by tools. Conceptually, it's probably useful to think of the Microsoft Repository in two parts.
The first part is the repository engine, a type-driven interpreter that is actually built on top of a SQL database (initially either Microsoft Access or SQL Server). The second part is the OIM part, a meta-meta model that can support a variety of information-model extensions such as database and OLAP.
Paul Harmon, editor of the monthly newsletter
Object-Oriented Strategies
and author of several books on the Unified Mo
deling Language (UML), describes a four-layer metamodeling architecture in the January issue of his newsletter. Meta-meta models such as the Object Management Group's (OMG's) Meta Object Facility (MOF) or Microsoft's OIM, he says, define the fundamental infrastructure for a metamodeling architecture, while metamodels such as UML and Microsoft's database model (DBM) are simply instances of a meta-meta model. Models and User Objects round out the four layers.
Microsoft's OIM is derived from UML, which means that the behavior of UML is present inside the OIM. At each level of the OIM, you inherit behaviors of the previous level. For example, the SQL Server model inherits behavior from the DBM, which inherits behavior from the OIM. ISVs and developers can build their own custom models based on information that can be inherited from other portions of the OIM. Other organizations and standards groups, such as those associated with creating a document-exchange standard, could also extend the model to support the
ir own repository efforts.
The Microsoft Repository's first information model basically offered support for UML, an analysis-and-design modeling language that has gained widespread industry support. That meant, for example, that you could create a Visual Basic program, use another optional download -- Visual Modeler, which is a subset of Rational Software's Rose product -- to reverse-engineer your Visual Basic program, and then export the design into the repository. At that point, the UML version of your Visual Basic program would be available to other repository-aware tools (at the time, limited to products such as Visio's tools).
However, what if the list of repository-aware tools included other programming languages, testing tools, project management tools, revision management tools, and so forth? According to Mike Budd, an analyst who tracks the CASE-tools and repository markets for Ovum, the Microsoft Repository is a clever way of adding enterprise panache to its immensely popular programming too
ls that are actually geared toward single programmers.
Budd also thinks that Microsoft has recognized the incredible value of middleware in the largest sense of the word. He sees the company's Microsoft Repository as a means of owning the glue that integrates the application-development process.
What does all this mean for you? At this point, you have two choices. Either experiment with the pretty rudimentary Microsoft Repository 1.0 and associated Visual Modeler and Visual Component Manager (VCM) tools. (VCM is Microsoft's "interface" to the Repository.) Or wait until version 2.0 of all these tools, which are expected to ship some time this summer with version 6.0 of Microsoft's Visual Studio development environment.
Other Repositories
Not only isn't the Microsoft Repository the only repository, it isn't even the only meta-meta model out there. The OMG's CORBA-based (Common Object Request Broker Architecture) meta-meta model provides another alternative, one that is embraced by many of
the traditional CORBA champions.
Unisys's Urep
repository (with prices starting at $1900) is the leading example of an OMG/MOF-compliant repository.
Unisys Fellow and Urep architect Sridhar Iyengar points out the advantages of Urep as a heterogeneous, multiplatform objectrepository that supports both COM and CORBA middleware on Unix, NT (client and server), and mainframe (client) platforms. He adds that, "Urep supports a rich set of core repository services, including object-level version control, nested transactions, and long transactions, which are not available in competing products." (According to Microsoft, Microsoft Repository 2.0 will support versioning.) Urep uses the Versant object database as its default storage engine.
IBM's VisualAge TeamConnection (prices start at $9995 per server) and associated DataAtlas represent another alternative that will be especially attractive to enterprises that use IBM's VisualAge tools. TeamConnection, which evolved from IBM's CMVC prod
uct, not AD/Cycle, is an open tool with a published API and is source code-compliant (i.e., interoperable with Microsoft's SourceSafe and other version-control products). Although TeamConnection currently uses Object Design's ObjectStore as its data store, the next version will reportedly be hosted on DB2 Universal DataBase (UDB).
Other repositories worthy of mention include LogicWorks' Universal Directory ($30,000), a data-warehouse-oriented repository; Viasoft's Rochade repository (formerly the R&O Repository with roots in the mainframe world, $35,000 and up); and Software Enabling Labs' Enabler, Visual Enabler, and Maestro suite (pricing starts at $3500 per developer).
Look Ahead
Repositories may not affect your life this month or even this year. However, it behooves you to start investigating this technology and thinking about how it's going to affect the way you design, develop, and manage applications, including data-warehousing applications. Platinum's Chris Justice, product mana
ger for Platinum Repository/OEE, cited hockey's Wayne Gretzky ("I play where the puck's going to be, not where it is...") in his perspective on the repository market. Repositories should be on your radar screen.
Selected Specifications
Metadata Council and MDIS specification
Internet: http://www.he.net/~metadata/