Archives
 
 
 
  Special
 
 
 
  About Us
 
 
 

Newsletter
Free E-mail Newsletter from BYTE.com

 
    
           
Visit the home page Browse the four-year online archive Download platform-neutral CPU/FPU benchmarks Find information for advertisers, authors, vendors, subscribers

ArticlesThe Road to a Universal Repository


May 1998 / Reseller / The Road to a Universal Repository

More than databases, repositories should hold the corporate IS jewels. Why don't they?

Karen Watterson

The problem with repositories, notes Michael Barnes, an analyst with the Hurwitz Group consultancy, is that no one wants to deal with them directly. We might add that no one wants to pay for them, either. Their appeal is simply as an enabling technology, something that's supposed to make programmers' and information technology (IT) enterprise architects's lives easier.

At their simplest level, repositories are basically databases. More accurately, they're database applications for system information. System information? In the context of repositories, system information refers to information about an organization's IT assets -- everything from C++ header files, component definitions, and COBOL copy books to information about on-line corporate knowledge-base assets. Repositories typically also contain da tabase design information, business rules, and corporate naming standards, for example. In a sense, a repository's role is similar to that of a library's card catalog -- an exhaustive and cross-indexed list of resources.

Chances are that most programmers will recognize the notion of a repository as the library or component manager associated with many of today's developer tools. Imagine that kind of library for an entire organization's resources. That's the vision of the repository, so it shouldn't be surprising that repositor ies have been called data dictionaries -- even encyclopedias -- and that their contents are often referred to collectively as metadata, or data about data.

Data repositories aren't new. For example, they have often been associated with CASE and data-modeling tools. CASE repositories have focused on storing design information, often about database schemata. Some repositories, usually from tool vendors, have been designed to store information related to the software-development process: source code, version history, project management information, and so on. But the need to share information across enterprises and government entities has led to a variety of domain-specific proposals for metadata repositories, including Federal Geographic Data Committee (FGDC) for geographic information systems, the Warwick Framework and Dublin Core for digital libraries, and industry standards such as Common Data Interchange Format (CDIF), a standard devised by CASE-tool vendors for modeling tools.

Repositories, then, a re tools to help manage computer systems and networks. Metadata is extensively used in systems and applications to gain efficiency when accessing, transferring, sharing, or processing large amounts of data.

An ideal repository will be distributed, open, and extensible. It will also be largely self-managing and will interoperate with metadata sets coming from different sources and represented using different standards. It will let itself be interrogated through open, standard, and well-defined interfaces.

Repositories are back in the public eye. This has happened largely because of their role in the exploding field of data warehousing and on-line analytical processing (OLAP) applications. Repository technology makes sense in data warehousing, because you need to store information about a data warehouse's (or OLAP server's) source data and about the extraction, cleansing, and aggregation rules that are associated with building and maintaining it.

Historically, data-warehousing-tool vendors have crea ted proprietary database applications to store and manage that data. Interestingly, some of these vendors make it easy to let traditional information-worker end users "browse" the repository data; others see IT staffers as their ultimate end users.

The Metadata Council was formed in July 1995 in an effort to help bridge the gap among proprietary stores of metadata, and the Metadata Interchange Specification (MDIS) is the result. Now in version 1.1, MDIS promises to be a valuable basis for interoperability. (See "The Quest to Standardize Metadata" by Stephen R. Gardner in the November 1997 BYTE for more on MDIS.)

Microsoft's Bottom-Up Strategy

Data warehousing isn't the only reason for the renewed interest in repositories. Another reason is Microsoft and its forthcoming Microsoft Repository 2.0. Unless you're a die-hard Visual Basic programmer, you probably don't even know that Microsoft shipped the first version of the Microsoft Repository in March 1997 as a Visual Basic add-in.

Althoug h thousands of programmers have reportedly downloaded it from the Microsoft site, it hasn't set the world on fire. In fact, more than one programmer has complained that, not only did they have trouble installing the Microsoft Repository, they couldn't figure out what they were supposed to do with it.

That's been the problem with most repositories. They tend to be a hassle to set up and maintain, and, from a programmer's perspective, there's no perceived added value.

Some of you will remember IBM's AD/Cycle, a grandiose, but unsuccessful, attempt to centralize the management of mainframe application development. IBM's repository initiatives date back to the late 1980s, and its first host-based repository shipped in 1990 as part of AD/Cycle. Since then, IBM has switched to the client/server model, and its repository technology has evolved through Configuration Management Version Control (CMVC), from 1991, to the current VisualAge TeamConnection, which has been available since 1995.

The simple fact t hat IBM's top-down initiative was so far ahead of its time was undoubtedly the main reason AD/Cycle failed. However, it, too, was widely viewed by programmers as unnecessary overhead with no payoff. To be fair, Microsoft admitted that Microsoft Repository 1.0 was mainly for independent software vendors (ISVs), and not programmers.

The Microsoft Repository dates back to May 1994, when Microsoft and Texas Instruments announced collaboration on the design of an object-oriented repository that would store OLE components. (TI, with its Information Engineering Facility [IEF]/Composer product, was then a major CASE-tool vendor. TI's software division has subsequently been sold to Sterling Software, and Composer is now part of Sterling's Cool family of products.) TI may be out of the picture now, but the Microsoft Repository remains an ActiveX/COM-based (Component Object Model) vision.

Last summer, Microsoft and Platinum Technology announced an alliance whereby Platinum received the rights to port the Microso ft Repository to non-Windows platforms and to databases other than SQL Server for Windows NT -- efforts that are both expected to bear fruit later this year. Platinum itself is a major high-end repository player (its prices start at $150,000), selling both Platinum Repository/MVS and Platinum Repository/OEE.

At the same time, Microsoft announced its Open Information Model (OIM), an extensible COM-based object model that defines the structure of objects shared by tools. Conceptually, it's probably useful to think of the Microsoft Repository in two parts.

The first part is the repository engine, a type-driven interpreter that is actually built on top of a SQL database (initially either Microsoft Access or SQL Server). The second part is the OIM part, a meta-meta model that can support a variety of information-model extensions such as database and OLAP.

Paul Harmon, editor of the monthly newsletter Object-Oriented Strategies and author of several books on the Unified Mo deling Language (UML), describes a four-layer metamodeling architecture in the January issue of his newsletter. Meta-meta models such as the Object Management Group's (OMG's) Meta Object Facility (MOF) or Microsoft's OIM, he says, define the fundamental infrastructure for a metamodeling architecture, while metamodels such as UML and Microsoft's database model (DBM) are simply instances of a meta-meta model. Models and User Objects round out the four layers.

Microsoft's OIM is derived from UML, which means that the behavior of UML is present inside the OIM. At each level of the OIM, you inherit behaviors of the previous level. For example, the SQL Server model inherits behavior from the DBM, which inherits behavior from the OIM. ISVs and developers can build their own custom models based on information that can be inherited from other portions of the OIM. Other organizations and standards groups, such as those associated with creating a document-exchange standard, could also extend the model to support the ir own repository efforts.

The Microsoft Repository's first information model basically offered support for UML, an analysis-and-design modeling language that has gained widespread industry support. That meant, for example, that you could create a Visual Basic program, use another optional download -- Visual Modeler, which is a subset of Rational Software's Rose product -- to reverse-engineer your Visual Basic program, and then export the design into the repository. At that point, the UML version of your Visual Basic program would be available to other repository-aware tools (at the time, limited to products such as Visio's tools).

However, what if the list of repository-aware tools included other programming languages, testing tools, project management tools, revision management tools, and so forth? According to Mike Budd, an analyst who tracks the CASE-tools and repository markets for Ovum, the Microsoft Repository is a clever way of adding enterprise panache to its immensely popular programming too ls that are actually geared toward single programmers.

Budd also thinks that Microsoft has recognized the incredible value of middleware in the largest sense of the word. He sees the company's Microsoft Repository as a means of owning the glue that integrates the application-development process.

What does all this mean for you? At this point, you have two choices. Either experiment with the pretty rudimentary Microsoft Repository 1.0 and associated Visual Modeler and Visual Component Manager (VCM) tools. (VCM is Microsoft's "interface" to the Repository.) Or wait until version 2.0 of all these tools, which are expected to ship some time this summer with version 6.0 of Microsoft's Visual Studio development environment.

Other Repositories

Not only isn't the Microsoft Repository the only repository, it isn't even the only meta-meta model out there. The OMG's CORBA-based (Common Object Request Broker Architecture) meta-meta model provides another alternative, one that is embraced by many of the traditional CORBA champions. Unisys's Urep repository (with prices starting at $1900) is the leading example of an OMG/MOF-compliant repository.

Unisys Fellow and Urep architect Sridhar Iyengar points out the advantages of Urep as a heterogeneous, multiplatform objectrepository that supports both COM and CORBA middleware on Unix, NT (client and server), and mainframe (client) platforms. He adds that, "Urep supports a rich set of core repository services, including object-level version control, nested transactions, and long transactions, which are not available in competing products." (According to Microsoft, Microsoft Repository 2.0 will support versioning.) Urep uses the Versant object database as its default storage engine.

IBM's VisualAge TeamConnection (prices start at $9995 per server) and associated DataAtlas represent another alternative that will be especially attractive to enterprises that use IBM's VisualAge tools. TeamConnection, which evolved from IBM's CMVC prod uct, not AD/Cycle, is an open tool with a published API and is source code-compliant (i.e., interoperable with Microsoft's SourceSafe and other version-control products). Although TeamConnection currently uses Object Design's ObjectStore as its data store, the next version will reportedly be hosted on DB2 Universal DataBase (UDB).

Other repositories worthy of mention include LogicWorks' Universal Directory ($30,000), a data-warehouse-oriented repository; Viasoft's Rochade repository (formerly the R&O Repository with roots in the mainframe world, $35,000 and up); and Software Enabling Labs' Enabler, Visual Enabler, and Maestro suite (pricing starts at $3500 per developer).

Look Ahead

Repositories may not affect your life this month or even this year. However, it behooves you to start investigating this technology and thinking about how it's going to affect the way you design, develop, and manage applications, including data-warehousing applications. Platinum's Chris Justice, product mana ger for Platinum Repository/OEE, cited hockey's Wayne Gretzky ("I play where the puck's going to be, not where it is...") in his perspective on the repository market. Repositories should be on your radar screen.


Selected Specifications

Metadata Council and MDIS specification

Internet: http://www.he.net/~metadata/

Federal Geographic Data Committee (FGDC) metadata standards

Internet: http://www.fgdc.gov/Metadata/metahome.html

Dublin Core

Internet: http://www.ukoln.ac.uk/metadata/resources/dc.html

Warwick Framework

Internet: http://www.dlib.org/dlib/july96/lagoze/o7lagoze.html

OMG (Object Management Group)

Internet: http://www.omg.org

CDIF specification

Internet: http://www.cdif.org

AIIM Document Management Alliance repository

Internet: http://www.aiim.org

Stanford Digital Libraries Project and Digital Library Interoperation Protocol

Internet: http://www-diglib.stanford.edu/diglib/pub

Stanford Digital Libraries Infobus Protocol

Internet: http://www-db.stanford.edu/~testbed


Further Reading

Object-Oriented Strategies
Internet: http://www.cutter.com/itgroup

Ovum
Internet: http://www.ovum.com

Hurwitz Group: 
Internet: http://www.hurwitz.com

DAMA International
Internet: http://www.dama.org

The Data Administration Newsletter
Internet: http://www.tdan.com

Proceedings from the Second IEEE Metadata Conference
Internet: http://www.llnl.gov/liv_comp/metadata/md97.html

Implementing a Corporate Repository
Adrienne Tannebaum
ISBN 0471-585378
John Wiley & Sons, 1994


Where to Find

IBM
Armonk, NY
Phone:    800-426-3333
Phone:    914-765-1900
Internet: http://www.software.ibm.com/ad/teamcon

LogicWorks
Princeton, NJ
Phone:    609-514-1177
Internet: http://www.logicworks.com

Microsoft
Redmond, WA
Phone:    800-426-9400
Phone:    425-882-8080
Internet: http://www.microsoft.com/repository

Platinum Technology
Oakbrook Terrace, IL
Phone:    800-442-6861
Internet: http://www.platinum.com/products/dataw/repos_ps.htm

Rational Software
Cupertino, CA
Phone:    800-728-1212
Phone:    408-863-9900
Internet: http://www.rational.com

Softlab
Munich, Germany
Phone:    +49 89 9936 1216
Phone:    770-290-8800
Internet: http://www.softlab.com

Unisys
Blue Bell, PA
Phone:    215-986-4011
Internet: http://www.urep.com

Viasoft
Phoenix, AZ
Phone:    602-952-0057
Phone:    800-448-8100
Internet: http://www.viasoft.com/rochade


OMG Repository Efforts


Meta Object Facility


Focuses on metadata and model management in distributed object
environments.


Object Analysis and Design Facility


Focuses on ob
ject analysis and design methods, metamodel, and tool
interoperability.


Source: The Object Management Group




Unisys's Universal Repository

illustration_link (19 Kbytes)

The Universal Repository's architecture is in some ways typical of repositories.


Microsoft's Repository Architecture

illustration_link (33 Kbytes)

This example of the Microsoft repository database design shows how it can hold COM objects.


Karen Watterson (San Diego, CA) is a writer and consultant specializing in database and data-warehousing issues. She is the author of several books and is editor of Pinnacle Publishing's Visual Basic Developer and SQL ServerProfessional newsletters. You can reach her at karen_watterson@msn.com .

Up to the Reseller section contentsGo to previous article: What Do Java Developers Want?
Flexible C++
Matthew Wilson
My approach to software engineering is far more pragmatic than it is theoretical--and no language better exemplifies this than C++.

more...

BYTE Digest

BYTE Digest editors every month analyze and evaluate the best articles from Information Week, EE Times, Dr. Dobb's Journal, Network Computing, Sys Admin, and dozens of other CMP publications—bringing you critical news and information about wireless communication, computer security, software development, embedded systems, and more!

Find out more

BYTE.com Store

BYTE CD-ROM
NOW, on one CD-ROM, you can instantly access more than 8 years of BYTE.
 
The Best of BYTE Volume 1: Programming Languages
The Best of BYTE
Volume 1: Programming Languages
In this issue of Best of BYTE, we bring together some of the leading programming language designers and implementors...

Copyright © 2005 CMP Media LLC, Privacy Policy, Your California Privacy rights, Terms of Service
Site comments: webmaster@byte.com
SDMG Web Sites: BYTE.com, C/C++ Users Journal, Dr. Dobb's Journal, MSDN Magazine, New Architect, SD Expo, SD Magazine, Sys Admin, The Perl Journal, UnixReview.com, Windows Developer Network