Force-fitting objects into a relational database just doesn't work well. The impedance problem is at the root of the incompatibilities.
Craig S. Mullins
The relational model has been king of the database hill for the past 10 to 15 years. During this period, few development projects that required a database used anything other than a relational system. But the times they are a-changin'. Object-oriented DBMS products are gaining wide acceptance for their ability to handle complex data in ways that relational products simply can't.
Defining the Paradigms
By definition, a paradigm is an example that serves as a pattern or model. A paradigm shift--a term increasingly seen in the computer trade press--refers to a fundamental change in the basic methods used to accomplish a task.
Most people connected wit
h computer systems realize that the industry changes daily. New technologies are constantly being developed, but most of these merely provide better ways of doing something you have always done. When a paradigm shift occurs, what changes is the essential core of how a task is performed--and perhaps defined.
The object-oriented model represents the latest paradigm for computer programming, after procedural languages and rule-based programming (e.g., in languages such as Prolog). Object-based languages--including Smalltalk, Eiffel, and C++--are based on manipulating objects, which encapsulate complex data structures and processes (usually called methods) for manipulating that data. To invoke a method, a message must be sent to the object in which the method is encapsulated. Because each object contains its own methods, most procedural code is eliminated. The object-oriented paradigm, while currently a hot topic in the computer world, is hardly new. The first OOPL (object-oriented programming language), S
imula-67, arrived on the scene in 1967.
Object Database Management Systems
To handle the data that OOPLs create and manipulate and to provide all the fundamental benefits of a DBMS to object-oriented applications, ODBMSes (object database management systems) were introduced. Typical benefits of an ODBMS include persistent data, data sharing, concurrent data access, and recovery control.
ODBMS products are designed to supplant relational DBMS products within an object-oriented development environment. Their architectures are designed to understand and utilize object-oriented techniques such as complex objects, abstract data types, encapsulation, and inheritance.
Object vs. Relational
What is the difference between an ODBMS and an RDBMS (relational database management system)? The primary difference is the ability of an ODBMS to support complex objects in an efficient and easy-to-manipulate form. A complex object consists of data and processes that manipulate that data. In contrast, R
DBMS products provide access to their data only in terms of rows and columns. And, other than triggers, an RDBMS can't store processing logic at the table level.
Other examples of complex objects include bill-of-materials hierarchies, CAD diagrams, and multimedia BLOBs (binary large objects). An ODBMS is ideally suited to store and manipulate these types of objects. Although some relational databases can process these types of objects, it is seldom easy or efficient to do so. Imagine using SQL Server to explode a bill-of-materials hierarchy from a fully normalized table. While technically possible, it is not truly feasible in terms of ease of use or performance.
Greater Abstraction
Abstraction is a key component of everyday life. People understand concepts such as an automobile, a light bulb, freedom, and trouble. Automobiles and light bulbs are concrete things, but you need not have them physically present to visualize them. Freedom and trouble are concepts that you can't see or touch, but y
ou understand them in the abstract.
An ODBMS raises the level of abstraction. Objects stored in an ODBMS are organized more closely to the way in which you view and use them in the real world. For example, consider the objects vehicle and automobile. All automobiles are vehicles, but not all vehicles are automobiles. In an ODBMS, you can implement an object of class Vehicle and subclass Automobile that inherits the methods and structure of Vehicle (see the figure "Inheritance"). Using an RDBMS, you would have to implement two separate tables, one for Vehicle and another for Automobile. The Vehicle table can't be used to define Automobile because an RDBMS lacks inheritance. In addition, different algorithms would have to be coded to access each subclass.
The Impedance Mismatch Problem
ODBMS products are sometimes touted as a solution to a problem encountered with RDBMS products. The problem is called impedance mismatch and refers to the difference between the declarative, set-level operation o
f relational-database query languages and the procedural, record-level operation of a typical 3GL (third-generation language). There are two components to impedance mismatch:
1. The difference between the set-at-a-time data manipulation language of the DBMS (e.g., SQL) and the record-at-a-time programming language (e.g., C or COBOL). When the declarative database language is embedded within the procedural language, the system can return multiple rows to a programming language that is not equipped to operate on sets of data (see the figure "Database Processing").
2. The difference between the typing systems that the DBMS uses and the general-purpose programming language used to develop the rest of the application. For example, many RDBMS products support date types and date arithmetic, but most programming languages don't support them. Therefore, data returned from the DBMS must be transformed into a form that the programming language understands (e.g., by converting a date to a character string).
Typically, when accessing a relational database, you must embed a SQL query within a 3GL program. Because the two languages operate at different levels and some mechanism must resolve the differences (i.e., by copying data from the database language to the programming language and back again), the system incurs extra overhead. On the other hand, you normally access data in an ODBMS by using an OOPL, so no impedance mismatch is encountered. All operations are at the record level.
Why do you need to use two languages? For one thing, the DML (data manipulation language) of the DBMS generally lacks computational completeness and can't handle the nondata manipulation components of an application. On the other hand, most programming languages lack the facility to handle persistent data other than in the form of files. They also typically lack abstract or high-level data types, constraints, and query capability. To enable the two languages to perform the tasks that each is ideally suited for, they must
be able to communicate with each other. This, in a nutshell, is the impedance mismatch problem.
Resolving the Impedance Mismatch
Several different techniques have been tried, with more or less success, to overcome the impedance mismatch:
Query Download. Using this approach, queries are developed that retrieve all objects that the process can require to be executed. The data is translated to a format that the process can read before accessing the object. After execution, the objects are copied back into the ODBMS. This approach results in very efficient program execution at the expense of significant start-up and exit overhead, limited (or no) concurrent data access, and the inability to use the ODBMS during program execution.
BLOBs. Another solution is to store objects in the ODBMS as BLOBs. Instead of identifying each field (i.e., column) and its attributes to the ODBMS, you can store the entire object in a single large field. Once again, the major benefit of this approach is efficienc
y, but the drawbacks are usually unacceptable. Because the ODBMS is unaware of the object's internal structure, querying is impossible--access requirements can't be coded against individual components (i.e., fields) of the BLOB. Also, the ability to allow concurrent access is problematic.
DBMS/Program Affinity. Defining the program's level of interaction with the DBMS to be at a field-by-field (or object-by-object) level removes the set-level versus record-level processing mismatch. However, it also reduces the effectiveness of database access, because all access is one record at a time, thereby crippling the system's ad hoc query capability.
Procedural Database Language. One of the most elegant solutions to the impedance mismatch problem is to augment the database's query language with procedural flow-of-control operations (e.g., WHILE loops and IF-ELSE constructs). Using this approach, more of the application can be written in the query language. A number of RDBMS products, such as Oracle's PL/S
QL and Sybase SQL Server's Transact-SQL, provide procedural extensions to SQL.
The coupling of program logic and data into a set-level procedural language is an attractive benefit of this approach. Likewise, performance is usually enhanced because network traffic is reduced. There are, however, some drawbacks; forcing a programmer to write the application using the database language instead of the general-purpose language limits the options and may pose difficulties. Finally, query languages often lack important computational powers and other important features, such as the ability to define and interact with the user interface.
Persistent Programming Languages. We call programming languages persistent when they can specify objects to "keep around" when the program is not executing. However, the ability to make program objects persistent still fails to address the interaction of the program with an ODBMS. Impedance mismatch will still occur because the DBMS's retrieval and data-typing scheme won'
t match those of the program.
Extended Data Types. Support for handling user-defined, extended data types is another approach. But unless both the DBMS and the programming language can define extended data types using the same methods, impedance mismatch is still inevitable.
Integration. In the end, the most workable solution is to simply remove the mismatch entirely. Integrating ODBMS products and programming languages so that they operate at the same level and use the same data model will eliminate the impedance mismatch problem. Implementing this approach, however, takes much effort. You must consider the following problems:
-- the lack of a formal, standard object model
-- an apparent compatibility conflict between encapsulation and
database query languages
-- the difficulty of providing standard interfaces and data models
given the reality of different commercial implementations
RDBMSes and Impedance Mismatch
Many RDBMS products handle impedance mismatche
s through two primary mechanisms: cursors and type translation. You can think of a cursor as a kind of pointer (see the figure "Cursors"). The programmer declares a cursor and defines a SQL SELECT statement for that cursor. An application program accessing that RDBMS can navigate, one row at a time, through the set of rows returned by the SQL statement. In essence, the program uses the cursor much like a sequential file. It opens the cursor, fetches one row at a time from the cursor, and then closes the cursor. When processing with cursors, a SQL statement can return zero, one, or many rows. Four distinct operations are available for cursors:
DECLARE. Defines the cursor, gives it a name unique to the program in which it is embedded, and assigns a SQL statement to the cursor name.
OPEN. Readies the cursor for row retrieval. OPEN reads the SQL search fields, executes the SQL statement, and builds the result table. It does not assign values to host variables, however.
FETCH. Returns data from t
he result table one row at a time and assigns the values to specified variables. If the result table is not built at cursor OPEN time, it is built fetch by fetch.
CLOSE. Releases all resources that the cursor uses.
Type translation occurs when the program attempts to retrieve a column that has been defined as a data type that the programming language doesn't support. Consider dates. For the program to access a date column, the data must be converted to a form that the programming language understands. For example, DB2 converts a date column to a PIC X(10) field in COBOL. Of course, some functionality is usually lost after this translation. The RDBMS can perform date arithmetic directly (e.g., calculating the number of days between two dates by simple subtraction), but the programming language cannot.
Object Databases Meet Their (Mis)match
ODBMS products manipulate data using the methods defined in object classes and via the programming language used to define the classes. This languag
e is usually an extension of an OOPL and therefore will operate at a record-by-record level. The ODBMS's DML (data manipulation language) has no impedance mismatch problem because it can access entire objects and is tied to the type-checking mechanism of the ODBMS.
Having the DML and the general-purpose programming language so tightly intertwined avoids the data-typing mismatch. Even in this scenario, however, access to database records is still row-at-a-time. Is this really a solution to the set-level component of the impedance mismatch problem? By moving back in time (i.e., prerelational) to record-at-a-time processing, you lose the benefits of set-level database access, such as eliminating the need for looping constructs (WHILE, FOR, and DO) for selecting, updating, deleting, or modifying multiple rows. A set-level language like SQL also makes ad hoc querying and report writing much easier.
Also, OOPLs came into use before ODBMSes were developed and widely available. Many object-oriented deve
lopment projects have used proprietary file structures or a SQL interface to a RDBMS. In this situation, the impedance mismatch problem looms as large as ever.
The Future
One thing you can expect to see in the future is that the rules that define and maintain the integrity of the database can be stored as methods in a reusable class library, along with the business rules that guide the way a company performs its day-to-day operations. Also, since the user interface, network interface, and procedural logic are increasingly being written in an OOPL, it will be necessary to couple the OOPL and the ODBMS more tightly together.
A DBMS can attempt to resolve the impedance mismatch by extending its DML so that more of the application can be written with the DML directly. Consider Sybase SQL Server and its extended SQL offering, Transact SQL. This is a product in which looping and IF...ELSE constructs can be used within SQL. Additionally, user-defined data types, rules, defaults, and triggers can be
attached to tables and columns, and stored procedures can be housed in the DBMS. Of course, Sybase is not an ODBMS: It lacks complex object support, inheritance, and many other object-oriented features.
A computationally complete DML will not, by itself, enable an entire application to be written in the ODBMS's language. Without programming constructs that control external resources, such as the user interface and hardware, the application cannot be successfully completed.
At present, ODBMS products are maturing and will inevitably eat into the huge market share enjoyed by RDBMS products. Nonetheless, RDBMSes will continue to prosper and thrive in the world of business data processing, where applications such as payroll and accounting do not generally require complex objects. Use of ODBMSes will continue to grow in those fields that require complex objects, such as CAD and manufacturing. There will always be true RDBMSes and true ODBMSes.
Successful RDBMS products will seek to incorporate
the best components of object-oriented technology without compromising the relational model. Likewise, successful ODBMS products will seek to incorporate the best features of the relational model, without compromising the benefits derived from classes, inheritance, and encapsulation. This will likely lead to a marriage of the two technologies into a hybrid technology--object relational--that builds on the strengths of both relational and object-oriented concepts.
But--and this is no small matter--until the impedance mismatch problem is resolved, inconsistencies will continue to exist between the ways in which the DBMS and the programming language handle data items.
Illustration: Inheritance
The class Vehicle can include objects of many different subclasses, each with its own special data attributes and methods. Automobile is only one subclass of Vehicle.
Illustration: Database Processing
When you access a database one record at a time (top), it takes many READ operat
ions to produce reports or result tables. If the DBMS operates at the set level, it can get the data it needs in one SELECT operation.
Illustration: Cursors
In a relational schema, a cursor is a data structure used to hold multiple rows of data returned by a SELECT statement. The application program can then read data from the cursor, accessing it a row at a time as if it were a sequential file, using the DECLARE, OPEN, FETCH, and CLOSE functions.
Craig S. Mullins is a technical researcher in database design, client/server technology, object orientation, and relational technology at Platinum Technology (Oakbrook Terrace, IL). He is also the author of DB2 Developer's Guide (Sams, 1992). You can contact him on Prodigy at WHNX44A, on CompuServe at 70410,237, or on BIX c/o "editors."