Archives
 
 
 
  Special
 
 
 
  About Us
 
 
 

Newsletter
Free E-mail Newsletter from BYTE.com

 
    
           
Visit the home page Browse the four-year online archive Download platform-neutral CPU/FPU benchmarks Find information for advertisers, authors, vendors, subscribers Request free information on products written about or advertised in BYTE Submit a press release, or scan recent announcements Talk with BYTE's staff and readers about products and technologies

ArticlesA Career in Data Modeling


June 1997 / Special Report / A Career in Data Modeling

Taking up-front time to organize a database model can save time during applications development.

J. L. Weldon

Some people are planners; others are executors. The mad rush to get applications into the hands of users has been a boon to executors. But the challenges of integrating data from a variety of sources into a consistent data warehouse has caused many development teams to reassess the value of one form of planning -- data modeling -- as a precursor to database design.

From the beginning, data modeling has been somewhat controversial. In the 1970s, there was a brief push toward developing an "enterprise data model." But this ide a has largely been abandoned, leaving many large, expensive, and uncompleted projects in its wake. CASE tools promised automated development of an application from its model. Again, reality has failed to deliver on the promise. Consequently, many developers view data-administration and inform ation-resource groups (which usually do the modeling) as obstacles to rapid development. Some think that these groups stand between the developers and application delivery, placing unnecessary constraints on developers' designs.

In other cases, however, firms have seen the value of having database models serve as a common framework within which new applications can be designed and older applications integrated. For instance, the reusability of good models has proved itself to be a valuable asset, rather than a hindrance, to rapid application development (RAD). Models have also proved useful to organizations trying to integrate many heterogeneous systems developed over time by design ers and developers who may be long gone from the corporate environment.

What do you need to know to be able to use database modeling in your company? To illustrate the issues and complexities of -- as well as the potential gains from -- good database models, I'll first explain the fundamentals.

What Is Data Modeling?

A model is an abstract representation of a real object or environment. Data modeling is the practice of designing a database using a series of related models. The process works something like this: First, you develop a high-level, conceptual model of the business process or activity you're going to support. Next, use this conceptual model to derive a logical data model that captures more detail, but in an implementation-independent way. Finally, transform the logical data into a physical data model, or schema , that provides the details of the database's implementation in a particular DBMS (see the figure "From Muddle to M odel" ).

The first step is, in many ways, the most time-consuming. You begin by collecting data and reviewing business procedures and practices to ascertain the business requirements of the application. These requirements lead to the identification and definition of entities and relationships essential to the business activities being represented. For example, in a university's registration application, the entities might include STUDENT, TEACHER, and CLASS, and the relationships might include STUDENT "is registered for" CLASS, TEACHER "is assigned to teach" CLASS, and CLASS "is composed of" STUDENTs.

In the conceptual model, you should figure out how you're going to distinguish instances of each entity type. For example, you determine the attribute, or group of attributes, necessary to uniquely identify a particular STUDENT. The university might issue a unique STUDENT ID NUMBER or use the student NAME and DATE OF BIRTH instead. Similarly, you figure out how to distinguish relationships -- whet her they're one-to-one (e.g., one LAB DESK for each STUDENT and vice versa) or one-to-many (e.g., one CLASS containing many STUDENTs). By determining these attributes and relationships, you ensure that the model accurately reflects reality.

You transform a conceptual model into a logical data model by capturing specific data about the descriptive attributes of each entity and more details about the relationships. For example, a STUDENT might be represented by his or her NAME, GENDER, and AGE, but not necessarily by eye color or blood type. As you select attributes, you capture their definitions as well as information on the domain of values on which the attributes are based. For example, GENDER might take its values from the domain {Male, Female}, and the domain for AGE might be {any integer greater than 0 and less than 120}.

Similarly, the description of a relationship can be made more informative if you associate cardinality data with it. For example, a STUDENT might be registered for u p to five CLASSes, yielding a one-to-five cardinality for the "is registered for" relationship. Furthermore, the relationship can be described as optional or mandatory (i.e., must each STUDENT register for at least one CLASS, or may some STUDENTs not be registered for any?).

As a model becomes more specific, you might be able to divide some entities into subtypes -- TEACHER might be either a PROFESSOR or a TEACHING ASSISTANT, for example. Also, some entities might be roles rather than actual entities, so a STUDENT might also act in the role of TEACHING ASSISTANT. In such a case, the attributes of TEACHING ASSISTANT include those of the STUDENT as well as others specific to the role, such as START DATE for the teaching assignment.

Once it's at its most specific state, you convert the data model from a logical representation to a description of the physical database. This process converts logical domains for attributes to the specific types of data that a DBMS can handle. It includes volumetrics (i.e., counts of expected entity instances and lengths for text-string fields) to determine size and space allocations; it also identifies attributes to be used for indexes and adds constraints, such as which fields can or cannot be null. This process continues until you've assembled enough information to create the database-description-language (DDL) statements to create the actual tables (see the figure "Backward and Forward" ).

Why Model?

Now you know what modeling is. But why do it? You hear this most often from applications developers and others who are anxious to get on with the implementation of a database or business system and are impatient with the time spent developing and refining a data model.

From a logical perspective, you model to increase your understanding of the business problem and to identify the basic components on which the solution will be built. A good model lets you develop a database that's flexible and supports new features as they become necessary.

Furthermore, these components are reusable. A flexible, reusable design promotes stability, and there's no need to revise the database as new applications are added. Finally, a database built from a model that accurately depicts the business is sharable across business functions, unlike one that's built for a specific function. It has been said that if people had always used proper modeling in the past, the need for data warehousing (i.e., integration of data from multiple functional applications) would never have come about.

Disciplines of data modeling, such as abstraction, generalization, and normalization, force you to explicitly evaluate design decisions that are made when moving from the abstract, logical representation to the physical database. Without such a model, developers are prone to build physical designs that incorporate existing, though not necessarily fundamental, data relationships. For example, a data record representing the university-registration rela tionship might be designed to include five fields for CLASS information (based on the fact that students may not register for more than five classes). But over time, such a rule is likely to change (say, to a maximum of six), and in either case the database must be changed.

From a physical-design perspective, data models are a vehicle for capturing and maintaining metadata -- data about the data -- such as business definitions, domain information, value lists, and edit criteria. Data models can translate relationships between entities into key-based associations between tables that allow a database to automatically enforce referential integrity (e.g., not allowing a CLASS to exist without STUDENTs). Physical data models also allow organizations to apply standards (e.g., standard data names and domain definitions), thus promoting consistency across applications. Physical models can also be used to automate certain detailed and time-consuming database-administration tasks, such as volumetric calc ulations and DDL generation, thus freeing an administrator to spend more time on performance and tuning issues.

Models have been found to be so valuable that many organizations have been developing them in reverse from existing databases that were originally built without the use of explicit design models. For more information, see the sidebar "Reverse Engineering".

You, Too, Can Model

Database modeling requires expertise in business-requirements analysis. To successfully build a model, you must interview business representatives, review processes and documentation, and make the model reflect your understanding of the environment. You review initial models with business experts and other analysts and then refine them. While much has been written about this process, it remains more in the realm of art than of science. In general, a good modeler is like a good reporter, continually asking, "What? Why? How? Where? When?"

Some notational systems exist that can help you develop and record data models. Bachman, Chen, Martin, and other data-modeling gurus each have their own methodology and notation. Each system has its own way of representing the essential aspects of the model (e.g., entities, relationships, cardinality, optionality, subtypes, and domains). While all you need to produce these diagrams is paper and a pencil, most modelers opt for a modeling tool, frequently referred to as a CASE tool.

Modeling tools have evolved from text-based mainframe tools to graphically oriented PC-based and client/server workgroup tools that interface with a variety of DBMSes. The advent of object-oriented systems and databases has also led to new and extended forms of modeling (see the sidebar "Object Modeling vs. ER Modeling"). These methods support user-oriented semantic modeling and object-oriented analysis and design methodologies. Most modeling tools in use today are entity-relationship (ER) tools (e.g., Logic Works' ERWin). But the interest in object orientation (OO) has resulted i n the appearance of tools designed to support object modeling as well (e.g., Rational Software's Rational Rose).

Modeling tools make extensive use of graphical interfaces and visual editing to produce model diagrams. Pop-up forms capture the necessary names and definitions. Definitions produce a data dictionary. You can produce reports to review with business users to validate and refine the definitions. If a tool can produce DDL automatically, the dictionary also includes physical design characteristics, such as volumes, domains, and indexes.

Most tools that support ER modeling also provide rule-checking to make sure the models produced are valid. These tools check for violations to the rules of normalization and flag the absence of primary keys or the inappropriate use of foreign keys. Since an analyst develops the model interactively, this feedback allows him or her to catch errors at once rather than their being perpetuated into lower-level models.

One continuing problem that modelers fa ce is communication with business users. The notational systems used by most modeling methodologies are more technical than intuitive, and many business users are uncomfortable with reviewing any model that's more complicated than the most abstract conceptual model. Recently a tool called InfoModeler, which is based on the Object Role Modeling (ORM) methodology espoused by G. M. Nijssen and Terry Halpin, has gained attention due to its fact-based approach.

In this approach, you gather statements in English (or any other language you choose) about an application or a busi-ness. A step-by-step process then groups the facts into fact types, checks for uniqueness, and eliminates any unnecessary types. A modeler assembles a complete model, which can be either diagrammatic or textual, by adding constraints and validating the model against the original set of facts. The existence of the corroborating facts, even with a model diagram, can improve communication with business reviewers.

Plan, Then Exe cute

Modeling a database as a part of the development process can be compared to developing a blueprint before constructing a building. A good model captures business essentials and focuses on the broad perspective rather than a narrow one. Good modelers separate those relationships and dependencies that result from current practice from those that are fundamental and unchanging. A sound data model leads to a database that is sharable, reusable, and flexible and that accurately reflects the business it supports.


Where to Find


CASEwise Systems, Inc.

Waltham, MA
Phone:    800-893-8398
Phone:    617-895-9900
Internet: 
http://www.casewise.com


Cayenne Software, Inc.

(formerly Bachman Informatio
n Systems)
Burlington, MA
Phone:    800-528-2388
Phone:    617-273-9003
Internet: 
http://www.bachman.com


Computer Systems Advisors

Woodcliff Lake, NJ
Phone:    800-537-4262 
Phone:    201-391-6500
Internet: 
http://www.silverrun.com


Embarcadero

San Francisco, CA
Phone:    415-834-3131
Internet: 
http://www.embarcadero.com


Evergreen Software

Redmond, WA
Phone:
    800-929-5194
Phone:    206-881-5149
Internet: 
http://www.esti.com


InfoModelers, Inc.

Bellevue, WA
Phone:    206-637-2499
Internet: 
http://www.infomodeler.com


InTek

Norcross, GA
Phone:    800-654-3249
Phone:    770-840-2500
Internet: 
http://www.intekinc.com


LBMS

Houston, TX
Phone:    800-345-5267
Phone:    713-625-9300
Internet: 
http://www.lbms.com


Logic Works

Princeton, NJ
Phone:    609-514-1177
Internet: 
http://www.logicworks.com


Oracle

Redwood Shores, CA
Phone:    415-506-7000
Internet: 
http://www.oracle.com


Popkin

New York, NY
Phone:    212-571-3434
Internet: 
http://www.popkin.com


Powersoft

Concord, MA
Phone:    800-395-3525
Phone:    508-287-1500
Internet: 
http://www.powersoft.com


Rational Software Corp.

Santa Clara, CA
Phone:    408-496-3600
Internet: 
http://www.rational.com


Salsa Business Unit

(Wall Data Systems)
Seattle, WA
Phone:    800-777-2572
Phone:    206-442-9257
Internet: 
ht
tp://salsa.walldata.com


Visible Systems Corp.

Waltham, MA
Phone:    800-684-7425
Internet: 
http://www.visible.com


HotBYTEs
 - information on products covered or advertised in BYTE


From Muddle to Model

illustration_link (34 Kbytes)

The data-modeling process helps you analyze what your company does and your application needs.


Backward and Forward

illustration_link (20 Kbytes)

Here's how a databse-modeling tool might reverse-engineer a database.


J. L. Weldon (New York, NY) heads the Data Warehouse Practice within the U.S. North region of MCI Systemhouse, a global-systems-integration firm. You can contact her by sending e-mail to jweldon@shl.com .

Up to the Special Report section contentsGo to previous article: So You Want to Use Business Rules...Go to next article: Reverse EngineeringSearchSend a comment on this articleSubscribe to BYTE or BYTE on CD-ROM  
Flexible C++
Matthew Wilson
My approach to software engineering is far more pragmatic than it is theoretical--and no language better exemplifies this than C++.

more...

BYTE Digest

BYTE Digest editors every month analyze and evaluate the best articles from Information Week, EE Times, Dr. Dobb's Journal, Network Computing, Sys Admin, and dozens of other CMP publications—bringing you critical news and information about wireless communication, computer security, software development, embedded systems, and more!

Find out more

BYTE.com Store

BYTE CD-ROM
NOW, on one CD-ROM, you can instantly access more than 8 years of BYTE.
 
The Best of BYTE Volume 1: Programming Languages
The Best of BYTE
Volume 1: Programming Languages
In this issue of Best of BYTE, we bring together some of the leading programming language designers and implementors...

Copyright © 2005 CMP Media LLC, Privacy Policy, Your California Privacy rights, Terms of Service
Site comments: webmaster@byte.com
SDMG Web Sites: BYTE.com, C/C++ Users Journal, Dr. Dobb's Journal, MSDN Magazine, New Architect, SD Expo, SD Magazine, Sys Admin, The Perl Journal, UnixReview.com, Windows Developer Network