ation-resource groups (which usually do the modeling) as obstacles to rapid development. Some think that these groups stand between the developers and application delivery, placing unnecessary constraints on developers' designs.
In other cases, however, firms have seen the value of having database models serve as a common framework within which new applications can be designed and older applications integrated. For instance, the reusability of good models has proved itself to be a valuable asset, rather than a hindrance, to rapid application development (RAD). Models have also proved useful to organizations trying to integrate many heterogeneous systems developed over time by design
ers and developers who may be long gone from the corporate environment.
What do you need to know to be able to use database modeling in your company? To illustrate the issues and complexities of -- as well as the potential gains from -- good database models, I'll first explain the fundamentals.
What Is Data Modeling?
A
model
is an abstract representation of a real object or environment.
Data modeling
is the practice of designing a database using a series of related models. The process works something like this: First, you develop a high-level, conceptual model of the business process or activity you're going to support. Next, use this conceptual model to derive a logical data model that captures more detail, but in an implementation-independent way. Finally, transform the logical data into a physical data model, or
schema
, that provides the details of the database's implementation in a particular DBMS (see the figure
"From Muddle to M
odel"
).
The first step is, in many ways, the most time-consuming. You begin by collecting data and reviewing business procedures and practices to ascertain the business requirements of the application. These requirements lead to the identification and definition of entities and relationships essential to the business activities being represented. For example, in a university's registration application, the entities might include STUDENT, TEACHER, and CLASS, and the relationships might include STUDENT "is registered for" CLASS, TEACHER "is assigned to teach" CLASS, and CLASS "is composed of" STUDENTs.
In the conceptual model, you should figure out how you're going to distinguish instances of each entity type. For example, you determine the attribute, or group of attributes, necessary to uniquely identify a particular STUDENT. The university might issue a unique STUDENT ID NUMBER or use the student NAME and DATE OF BIRTH instead. Similarly, you figure out how to distinguish relationships -- whet
her they're one-to-one (e.g., one LAB DESK for each STUDENT and vice versa) or one-to-many (e.g., one CLASS containing many STUDENTs). By determining these attributes and relationships, you ensure that the model accurately reflects reality.
You transform a conceptual model into a logical data model by capturing specific data about the descriptive attributes of each entity and more details about the relationships. For example, a STUDENT might be represented by his or her NAME, GENDER, and AGE, but not necessarily by eye color or blood type. As you select attributes, you capture their definitions as well as information on the domain of values on which the attributes are based. For example, GENDER might take its values from the domain {Male, Female}, and the domain for AGE might be {any integer greater than 0 and less than 120}.
Similarly, the description of a relationship can be made more informative if you associate
cardinality
data with it. For example, a STUDENT might be registered for u
p to five CLASSes, yielding a one-to-five cardinality for the "is registered for" relationship. Furthermore, the relationship can be described as optional or mandatory (i.e., must each STUDENT register for at least one CLASS, or may some STUDENTs not be registered for any?).
As a model becomes more specific, you might be able to divide some entities into subtypes -- TEACHER might be either a PROFESSOR or a TEACHING ASSISTANT, for example. Also, some entities might be roles rather than actual entities, so a STUDENT might also act in the role of TEACHING ASSISTANT. In such a case, the attributes of TEACHING ASSISTANT include those of the STUDENT as well as others specific to the role, such as START DATE for the teaching assignment.
Once it's at its most specific state, you convert the data model from a logical representation to a description of the physical database. This process converts logical domains for attributes to the specific types of data that a DBMS can handle. It includes
volumetrics
(i.e., counts of expected entity instances and lengths for text-string fields) to determine size and space allocations; it also identifies attributes to be used for indexes and adds constraints, such as which fields can or cannot be null. This process continues until you've assembled enough information to create the database-description-language (DDL) statements to create the actual tables (see the figure
"Backward and Forward"
).
Why Model?
Now you know what modeling is. But why do it? You hear this most often from applications developers and others who are anxious to get on with the implementation of a database or business system and are impatient with the time spent developing and refining a data model.
From a logical perspective, you model to increase your understanding of the business problem and to identify the basic components on which the solution will be built. A good model lets you develop a database that's flexible and supports new features
as they become necessary.
Furthermore, these components are reusable. A flexible, reusable design promotes stability, and there's no need to revise the database as new applications are added. Finally, a database built from a model that accurately depicts the business is sharable across business functions, unlike one that's built for a specific function. It has been said that if people had always used proper modeling in the past, the need for data warehousing (i.e., integration of data from multiple functional applications) would never have come about.
Disciplines of data modeling, such as abstraction, generalization, and normalization, force you to explicitly evaluate design decisions that are made when moving from the abstract, logical representation to the physical database. Without such a model, developers are prone to build physical designs that incorporate existing, though not necessarily fundamental, data relationships. For example, a data record representing the university-registration rela
tionship might be designed to include five fields for CLASS information (based on the fact that students may not register for more than five classes). But over time, such a rule is likely to change (say, to a maximum of six), and in either case the database must be changed.
From a physical-design perspective, data models are a vehicle for capturing and maintaining
metadata
-- data about the data -- such as business definitions, domain information, value lists, and edit criteria. Data models can translate relationships between entities into key-based associations between tables that allow a database to automatically enforce referential integrity (e.g., not allowing a CLASS to exist without STUDENTs). Physical data models also allow organizations to apply standards (e.g., standard data names and domain definitions), thus promoting consistency across applications. Physical models can also be used to automate certain detailed and time-consuming database-administration tasks, such as volumetric calc
ulations and DDL generation, thus freeing an administrator to spend more time on performance and tuning issues.
Models have been found to be so valuable that many organizations have been developing them in reverse from existing databases that were originally built without the use of explicit design models. For more information, see the sidebar "Reverse Engineering".
You, Too, Can Model
Database modeling requires expertise in business-requirements analysis. To successfully build a model, you must interview business representatives, review processes and documentation, and make the model reflect your understanding of the environment. You review initial models with business experts and other analysts and then refine them. While much has been written about this process, it remains more in the realm of art than of science. In general, a good modeler is like a good reporter, continually asking, "What? Why? How? Where? When?"
Some notational systems exist that can help you develop
and record data models. Bachman, Chen, Martin, and other data-modeling gurus each have their own methodology and notation. Each system has its own way of representing the essential aspects of the model (e.g., entities, relationships, cardinality, optionality, subtypes, and domains). While all you need to produce these diagrams is paper and a pencil, most modelers opt for a modeling tool, frequently referred to as a CASE tool.
Modeling tools have evolved from text-based mainframe tools to graphically oriented PC-based and client/server workgroup tools that interface with a variety of DBMSes. The advent of object-oriented systems and databases has also led to new and extended forms of modeling (see the sidebar "Object Modeling vs. ER Modeling"). These methods support user-oriented semantic modeling and object-oriented analysis and design methodologies. Most modeling tools in use today are entity-relationship (ER) tools (e.g., Logic Works' ERWin). But the interest in object orientation (OO) has resulted i
n the appearance of tools designed to support object modeling as well (e.g., Rational Software's Rational Rose).
Modeling tools make extensive use of graphical interfaces and visual editing to produce model diagrams. Pop-up forms capture the necessary names and definitions. Definitions produce a data dictionary. You can produce reports to review with business users to validate and refine the definitions. If a tool can produce DDL automatically, the dictionary also includes physical design characteristics, such as volumes, domains, and indexes.
Most tools that support ER modeling also provide rule-checking to make sure the models produced are valid. These tools check for violations to the rules of normalization and flag the absence of primary keys or the inappropriate use of foreign keys. Since an analyst develops the model interactively, this feedback allows him or her to catch errors at once rather than their being perpetuated into lower-level models.
One continuing problem that modelers fa
ce is communication with business users. The notational systems used by most modeling methodologies are more technical than intuitive, and many business users are uncomfortable with reviewing any model that's more complicated than the most abstract conceptual model. Recently a tool called InfoModeler, which is based on the Object Role Modeling (ORM) methodology espoused by G. M. Nijssen and Terry Halpin, has gained attention due to its fact-based approach.
In this approach, you gather statements in English (or any other language you choose) about an application or a busi-ness. A step-by-step process then groups the facts into fact types, checks for uniqueness, and eliminates any unnecessary types. A modeler assembles a complete model, which can be either diagrammatic or textual, by adding constraints and validating the model against the original set of facts. The existence of the corroborating facts, even with a model diagram, can improve communication with business reviewers.
Plan, Then Exe
cute
Modeling a database as a part of the development process can be compared to developing a blueprint before constructing a building. A good model captures business essentials and focuses on the broad perspective rather than a narrow one. Good modelers separate those relationships and dependencies that result from current practice from those that are fundamental and unchanging. A sound data model leads to a database that is sharable, reusable, and flexible and that accurately reflects the business it supports.
Where to Find
CASEwise Systems, Inc.
Waltham, MA
Phone: 800-893-8398
Phone: 617-895-9900
Internet:
http://www.casewise.com