Archives
 
 
 
  Special
 
 
 
  About Us
 
 
 

Newsletter
Free E-mail Newsletter from BYTE.com

 
    
           
Visit the home page Browse the four-year online archive Download platform-neutral CPU/FPU benchmarks Find information for advertisers, authors, vendors, subscribers

ArticlesA Blueprint for Managing Documents


May 1997 / Features / A Blueprint for Managing Documents

A good Electronic Document Management System can bring together document storage, work flow, and indexing.

James Boyle

In many Fortune 1000 companies, intranet development is moving from the skunk works to mission-critical operations. This breakneck pace comes because intranets quickly publish corporate information for delivery to users via low-cost and easy-to-use Web technology.

Unfortunately, the ease and speed of intranet publishing also create serious problems. Users become overwhelmed with the quantity of information when they can't find the data they need. Webmasters promise that intranet information is current, accurate, and complete, but they then must scramble to fulfill this dream. The result: Companies often base important business decisions on bad data.

How do you take control of corporate information in a Web world? For a growing number of corporations, the answer is an Electronic Document Management System (EDMS). An EDMS is not a single entity but a collection of complementary technologies. The three most crucial ones are the repository (see "Inside an EDMS" ), the work-flow engine, and the searching-and-indexing technology.

Every company that needs an EDMS doesn't necessarily need all three of the components. The flexibility and power of EDMSes are both a strength and a challenge. To implement the right system for your company, you need to know how EDMSes work and which document management features match your business needs and processes. Here's some help in making those decisions.

Repository: The Core Component

The document repos itory, the soul of an EDMS, stores, controls, and manages documents. Key repository functions include library services (e.g., controlling access to individual documents, document cataloging, check-in/check-out, and searching for and retrieving documents). Another key function is version control, including a history of all instances of a document as it changes over time.

Repositories also provide configuration management, control over the relationships between documents and their component parts (e.g., a manual and its chapters). This is critical to treating documents as true containers. It's an area where many repositories come up short. Each vendor's tools implement these technologies in slightly different ways, but the concepts are consistent.

The architecture an EDMS vendor uses determines if the repository is simply a database engine or a completely separate application. Each vendor designs its EDMS as either a two- or three-tier system, so your architecture choice is part of the decision you make when you select a vendor. You choose by analyzing the same trade-offs as with any client/server system: cost versus performance and scalability. In a two-tier architecture (i.e., the database), the client performs more work than in a three-tier environment, where the server is the workhorse.

The database stores all the information about the documents but generally not the documents. The database contains a file pointer -- the link tying the database and the file system together. The server application controls the file pointer (see the figure "Using Metadata" ). The document information -- referred to as metadata or attributes -- typically includes date, author, and title. The database may also store other attributes that the user does not directly provide. Examples include version numbers, or in the case of a document set, pointers that indicate which chapters belong to a particular manual.

Many desktop applications are OLE- or OpenDoc-compliant, which enables them to li nk content objects from one application to another without requiring you to do any cutting and pasting. EDMSes generally can recognize OLE documents and are intelligent enough to maintain these files and links as relationships in a document repository. Repositories let you create container objects and use the EDMS to link the pieces. How well this works depends on the tools you use to create, assemble, and publish the content. Also important is the level of integration between these tools and the repository. This is where a more integrated publishing-oriented system such as Interleaf shines.

Work Flow for Efficiency

Work flow can eliminate any dead time a document spends in transit between workers. It can also let people review a document in parallel instead of serially, which saves time in the sign-off process. Coupled with a repository, work flow can provide a full audit history, including review comments. Work-flow systems might also notify workers when a new version of a document becomes available. Finally, the work-flow engine may drive the conversion process for documents you create in one format but distribute, via the Web, for example, in a different format.

Work-flow engines typically have two critical integration points in an EDMS -- the repository and the e-mail system. Repositories that include work flow are Open Text and NovaSoft. Saros and PC Docs rely on third parties, such as FileNet or Action Technologies. In either case, the work-flow system must interact through APIs with the repository, because the work-flow system must have access to the documents and their security, attributes, and other information.

Most routing messages from a work-flow system go to a proprietary inbox. While they're fine for people who deal with document production (e.g., technical writers, graphic artists, and marketing departments), proprietary inboxes are inadequate for office workers who may already be burdened by assignments from multiple sources. Asking them to check yet another inbox is unacceptable. Instead, the system should route document-related messages to the e-mail system. However, it is important to note that the document is not attached to the message. For a repository to maintain data integrity, it cannot release documents to an uncontrolled system such as e-mail.

Finding Essential Data

An EDMS can provide more focused, and consequently, more efficient searches than standard full-text technologies by confining searches to specific attributes. Thus, instead of showing you all the documents that contain the words engine and repair , the system can search for those words in all documents of the type "Procedure" that were approved in the last year. The search returns a handful of entries rather than a list with hundreds of document names.

Repositories can add documents to a full-text index as they are checked in or via a batch job in off-hours to keep the index updated. The search interface executes attribute searches against a relational DBMS (RDBMS) and a word search against the full-text index. The system joins the two results to provide a granular approach to finding needles in haystacks.

EDMS vendors are combining this powerful search capability with Web interfaces to reduce search-engine maintenance from two (EDMS and Web site) to one. Two common EDMS search engines are those from Fulcrum and Verity. The full-text engines found on the Web, such as Open Text, Excite, and AltaVista, are also starting to appear in EDMSes.

Pull It Together

Armed with a basic understanding of the core EDMS technologies, let's look at putting together an infrastructure to manage a Web site. The biggest challenge lies in bridging the gap between creators of materials and consumers of information.

Three architectures address this challenge: manual, publishing, and access (see the figure "Three Distribution Models" ). The manual model -- the one used in most Web sites -- provides a way to create documents, convert them to an on-line format such as HTML, and publish them to the Web site. The Webmaster receives any new content and converts these documents to the correct format. He or she then posts the documents to the Web site and adds hyperlinks to and from the documents.

Unfortunately, the manual process is error-prone and time-consuming. There is also no tracking or other type of document control, because no repository exists in this model. Information consumers must rely on the Webmaster to know that a source document changed and ensure that it is converted and placed on-line. This process is informal, and with a site set of any size, it usually breaks down.

In the second approach -- the publishing model -- a repository stores, manages, and controls documents. The publishing step in the process extracts documents from the repository and puts them on the Web site. This is a batch process, and often the work-flow engine drives the publishing step, so updates happen in real time.

T he problem with this approach is that information consumers can't use all the power of the repository, including attribute searches or security. In the publishing model, you often must build another full-text index using the content of the on-line documents. This results in a duplication of effort, and because the on-line documents are detached from the repository, the attributes are not available for searching. Nevertheless, this model can give you a high degree of confidence that the intranet information is current.

In the access model, all documents -- native and viewable -- are stored in the repository. The conversion process may be automated, depending on the tools you select. Many repository vendors have added a component to the repository that allows Web-based viewing access to the documents. From a user's perspective, the browser seems to be looking at the Web, but it is looking into a repository.

This new approach offers some advantages. Users can easily drill down to find the information they need, because the interface shows attribute and hierarchical relationships in the information. The complete search capability of the repository is usually available in this model, as are security schemes to control who can access secure documents.

By connecting an interface to the repository for viewing, the system lets you create much of the navigation layer on the fly, using the information a document knows about itself. In this way, there is significantly less work in on-line publishing, but it requires that relationships and attributes be kept up-to-date. However, there is only one place to keep the information current.

Technical Considerations

You must consider three main technical issues before committing to an EDMS: Choosing the right computing platform, dealing with network throughput, and designing the database.

Platform issues encompass both client and server decisions. If you have a multiplatform environment, find out how close the EDMS you're considering is to a unified code base. Don't just ask the vendor. Look at past release schedules across the platforms that matter to you to see how close product introductions for secondary platforms followed the primary platform. With a unified code base, there should be only weeks between platform shipments. However, with nonunified code bases, releases might trail each other by months, and even then, all versions may not include the same features.

Until recently, most repository servers ran only on Unix, because it offered the necessary processing power and security guarantees. Now, the availability of Windows NT means that repositories don't have to be confined to Unix to retain the necessary features for an EDMS system. However, before you choose an NT release that repository vendors have or will soon ship, consider whether the systems can scale enough to match the growth demands you expect for your organization. Find out if the server application takes advantage of symmetric multiprocessing (SMP) machines.

Repository databases can be relational or object-oriented. Today, most systems use the RDBMS because of its stability and performance. However, object-oriented DBMSes might replace them in the coming years. Most EDMS vendors support one or more of the following: Oracle, Sybase, Informix, and SQL Server.

Choosing a Solution

A number of tools claim to help manage your Web site. You can categorize them in these groups: site managers, Web servers, and compound-document management tools.

Site managers help Webmasters monitor a site. They check links, locate orphaned files, and summarize usage. These products also provide a graphical representation of the site. Some even provide authoring tools. Tools in this category include Adobe SiteMill, AOL Press, and Microsoft FrontPage.

Web servers are continually adding more features for document management. However, the most sophisticated management capability you'll get out of today's Web server is a basic check-in/check-out functi on. This capability is weak on security and can't handle multiple versions of documents, but for a small HTML-only document set, this alternative is helpful. However, organizations that need to manage native documents, work flow, or the relationships in a compound document should consider a full document management tool.

The most promising content management tools come from the traditional compound-document vendors, such as Documentum, Interleaf, Open Text, PC Docs, and Saros. Over the past 18 months, all these companies have revamped their tools to let them run over the Web. In so doing, they have shifted from the publishing model to the access model.

Where the Web site is both the means and the end, there may be another approach. Documentum designed RightSite to manage the content of Web sites. It can automatically move an entire site into a repository. The repository is then configured to add attributes, security, work flow, and searching capabilities.

Once the system is configured, you c an rework static pages to contain queries and bring these pages to life. The user would never see the EDMS, but suddenly everything is current, and users see information relevant to them. Most interesting, RightSite can manage hyperlinks by treating them as another information type.

As intranets grow and mature this year, many organizations will be looking for ways to reduce the expense of managing these mission-critical document sets. Adding another Webmaster to the IS payroll certainly isn't the best answer. Document management can clean up after and help IS organizations cope with the largest revolution in corporate computing, the intranet.


Where to Find

Diamond Head 
Richardson, TX
Phone:    800-428-6657
Phone:    972-479-9205
Internet: http://www.dhs.com

Documentum
Pleasanton, CA
Phone:    888-362-3367
Phone:    510-225-9421
Internet: http://www.documentum.com

Excite
Mountain View, CA
Phone:    415-934-1200
Internet: http://www.excite.com

FileNet
Costa Mesa, CA
Phone:    800-345-3638
Phone:    714-966-3400
Internet: http://www.filenet.com

Fulcrum
Ottawa, Ontario, Canada
Phone:    800-385-2786
Phone:    613-238-1761
Internet: http://www.fulcrum.com

Open Text
Waterloo, Ontario, Canada
Phone:    800-507-5777
Phone:    519-888-7111
Internet: http://www.opentext.com

PC Docs
Burlington, MA
Phone:    800-933-3627
Phone:    617-273-3800
Internet: http://www.pcdocs.com

Saros
Bellevue, WA
Phone:    800-827-2767
Phone:    206-646-1066
Internet: http://www.saros.com

Verity
Sunnyvale, CA
Phone:    408-541-1500
Internet: http://www.verity.com

Inside an EDMS

illustration_link (40 Kbytes)


Three Distribution Models

illustration_link (62 Kbytes)


Using Metadata

illustration_link (27 Kbytes)


James Boyle is manager of electronic document solutions at RWD Technologies, a syst ems-integration and consulting company in Columbia, Maryland. You can contact him at jboyle@rwd.com .

Up to the Features section contentsGo to next article: Standards Will Shape the EDMS Future
Flexible C++
Matthew Wilson
My approach to software engineering is far more pragmatic than it is theoretical--and no language better exemplifies this than C++.

more...

BYTE Digest

BYTE Digest editors every month analyze and evaluate the best articles from Information Week, EE Times, Dr. Dobb's Journal, Network Computing, Sys Admin, and dozens of other CMP publications—bringing you critical news and information about wireless communication, computer security, software development, embedded systems, and more!

Find out more

BYTE.com Store

BYTE CD-ROM
NOW, on one CD-ROM, you can instantly access more than 8 years of BYTE.
 
The Best of BYTE Volume 1: Programming Languages
The Best of BYTE
Volume 1: Programming Languages
In this issue of Best of BYTE, we bring together some of the leading programming language designers and implementors...

Copyright © 2005 CMP Media LLC, Privacy Policy, Your California Privacy rights, Terms of Service
Site comments: webmaster@byte.com
SDMG Web Sites: BYTE.com, C/C++ Users Journal, Dr. Dobb's Journal, MSDN Magazine, New Architect, SD Expo, SD Magazine, Sys Admin, The Perl Journal, UnixReview.com, Windows Developer Network