Archives
 
 
 
  Special
 
 
 
  About Us
 
 
 

Newsletter
Free E-mail Newsletter from BYTE.com

 
    
           
Visit the home page Browse the four-year online archive Download platform-neutral CPU/FPU benchmarks Find information for advertisers, authors, vendors, subscribers Request free information on products written about or advertised in BYTE Submit a press release, or scan recent announcements Talk with BYTE's staff and readers about products and technologies

ArticlesRDBMSes Get a Make-Over


April 1997 / Features / RDBMSes Get a Make-Over

Are technologies such as DataBlades solutions or bandages for complex data management?

Jay-Louise Weldon

Relational database management systems (RDBMSes) are the lifeblood of most corporate data centers. You can use an RDBMS to retrieve data by unique key fields and by linking fields between related records. Type in "Retrieve all employees who live in New York with managers who live in Connecticut" and your RDBMS will snap-to like a military cadet.

But the world is changing. As computers spread into wider application areas, RDBMSes can go crazy trying to handle complex data types such as images, documents, time-series inputs, or 3-D coordinates, which need special binary encodings to represent data. Data representation is only part of the challenge. Processing method s also vary among complex data types. A time series needs a start date and a calendar that specifies the intervals between observations, while processing methods for v ideo and audio objects, for example, require play and rewind capabilities. In essence, complex data types are best represented as objects. They encapsulate the details of data, structures, and methods regarding how the object interacts with other objects.

To be effective, RDBMS technology must support all traditional relational functions as well as data objects. Said another way: RDBMSes must embrace some characteristics of object-oriented database management systems (OODBMSes) to become object-relational database management systems (ORDBMSes).

The major DBMS vendors, including IBM, Informix, Oracle, and Sybase, have delivered or are about to deliver technologies that push RDBMS technology into the world of objects. But the question arises: Are these bolt-on fixes that you can trust for the long ter m? Or should you wait for a new generation of DBMSes built from the ground up to handle complex data types? We'll explore these questions in this article and in the one that follows. Here, we will look at the capabilities of current ORDBMS technologies; the article "How to Improve RDBMSes" will peer into the future of DBMS design.

Object Advantages

With the exception of binary codes, all data processed by computers is complex data. Normally we think of characters and numbers (either integer or real) as basic data types since most DBMSes and programming languages recognize and handle them. Using a data type known to the DBMS or language frees programmers from managing the details of storing and processing the data. For example, when using numbers declared as real (often stored as a sign, a base value, and an exponent value), a programmer can specify numeric calculations (TAX = REVENUE * RATE) and rely on the DBMS or program code to properly handle the component parts of the factors and create and store the components of the result.

ORDBMSes support all the traditional relational functions as well as supporting data types other than the usual characters and numbers. Thus ORDBMSes are attractive because they let applications and processing styles that have traditionally used proprietary data management techniques become an integral part of the enterprise database. This offers the possibility of consistently managing heterogeneous business objects across multiple platforms, and it promises a common interface for managing all types of data. The result: simplified user requests and application code.

As business users increasingly recognize the value of managing collections of heterogeneous data objects, ORDBMSes may become an essential part of DBMSes. The big challenge now is extending the capabilities of existing RDBMSes to become object-aware.

There are three capabilities RDBMSes must have in order to efficiently handle complex data types and objects. First, they must have s torage and indexing techniques customized to each data structure. For example, methods that understand the structure of a fingerprint data type will be able to store, retrieve, and query it more efficiently than those that treat it as a binary large object (BLOB).

Second, content-based retrieval requires special methods. Image, audio, and video data will need the equivalent of text search engines that now manage documents (see the sidebar "Opening Doors to Complex Data").

Third, to deliver peak performance for search and retrieval of complex data, query optimization and the retrieval process itself must be customized to the type of data being retrieved. The cost -- in terms of system resources -- of complex queries for retrieving CAD/CAM designs is quite different from that for retrieving selected rows from a relational table. This cost difference increases in importance if the data objects you're searching for exist in a distributed computing environment.

Creating a DataBlade

While each of the DBMS vendors is tackling these issues, Informix has the best-known answer, thanks to the publicity surrounding its purchase of the Illustra ORDBMS. This technology uses DataBlades -- application-focused collections of data structures and code -- to implement complex data types and objects into an extended relational database environment.

To add a new data type to a relational database, you must create both data structures and functions. The data structures must include the external representation and the internal format of the new data type. The external representation describes how values of the new data type will be displayed and also how values of the new type will be deciphered when presented as input. The internal representation describes how these values will be maintained in memory for processing. Each new data type must be supported by code that converts its external representation to its internal representation (and vice versa) as a way of creating and presenting instances of the data object. With DataBlades, you can use any of the basic data types (integer, character, etc.) and standard constructs (arrays, sets, lists, etc.) to create new data types. You can also develop data types that are variants of preexisting ones supported by the same or a different DataBlade.

A DataBlade must also include various functions for each new data type. At a minimum it must include functions to store, retrieve, display, modify, and query instances of this new type. The storage and retrieval functions can rely on standard access methods for the base data types used by the new type, or the DataBlade can include access methods that are specific to the new type. Modify functions include any operators (such as arithmetic, string, or specialized operations) that change the value of an instance of the new type. For example, to create a lag in a time series, you might modify an existing time series by shifting back the value for each period by one or more periods. Query functions include variations of the standard comparative operations (e.g., equality, less than, greater than, like) and possibly other special functions, such as "distance" for geographic data types. The DataBlade can also contain functions that support conversion from one data type to another (called casts ). For example, a function might convert a text document into a standard relational table consisting of a line number and a text string for each line in the document.

Each DataBlade also contains metadata on the resource cost of the various functions as applied to each complex data type. The Informix RDBMS engine uses this information to perform global optimization of queries that include several different types of data.

The DataBlade model supports the addition of objects as well. The DataBlade for an object would include its data structure (possibly a complex structure of other data types and even other objects) as well as the methods that implement the behavior of the object type. Th e model also can handle standard object-oriented constructs, such as inheritance and polymorphism. An object or data type can inherit properties or methods from other objects or types, and different objects or types can implement a function or an operator with the same name in different ways. For example, "distance" for geographic points can be different than "distance" for points in 3-D drawings.

Implementing DataBlades

A shared metadata repository and dynamically linked libraries of function code make the data types implemented through a DataBlade accessible to the Informix RDBMS. The RDBMS engine uses the data structures and functions defined by the DataBlade at various stages of normal processing. You can write DataBlade functions in either SQL, C, or C++ using a proprietary API to the database server kernel. Informix also plans to add support for other languages, including Java. The SQL statements CREATE TYPE and CREATE FUNCTION register DataBlade data types and functions with th e Informix RDBMS server. The files containing DataBlade data and code are part of the link step when building the RDBMS server.

DataBlade functions in SQL execute like macros as part of the SQL statement processing. Functions written in C are compiled, stored as an executable, filed, and loaded dynamically during query processing, as needed. This dynamic binding insulates application code from the function implementation and allows the implementation to change without affecting the application code.

A DataBlade can also incorporate remote procedure calls that tie an external system into the DBMS. You can use this approach to integrate data stored in heterogeneous DBMSes and file systems as well as across distributed platforms.

Impact on Developers

The extensibility provided by this model can be a boon to application developers. In-house programmers can eliminate the need to manage complex data types from application code. This yields leaner code, whic h can be produced more quickly and can remain unchanged as the data management details change. Third-party developers offer DataBlades in specialty areas (time series, geographic data, test processing, and so on) that in-house developers can use without the need for further programming. Or, if the application demands, in-house developers can create new data types that are variants of those provided, relying on inheritance for most of the functionality.

Informix offers a foundation DataBlade with support for over 40 different data types. The company also provides a wide variety of third-party DataBlades.

The main challenges for developers are the learning curve required to understand the object-relational framework and being able to develop code within the constraints of the API. As with all object-oriented frameworks, maximum reusability relies on making good choices for basic functions, out of which more complex functions can be built. Portability across platforms is also an issue since DataBlade access methods can be very platform-specific. And lastly, the interoperability of functions and data types among different DataBlades can be problematic, especially since there are no standards within applications as yet.

Other Approaches

Informix is not the only one providing RDBMS object-relational capabilities. IBM is following a similar architectural approach with its DB2 Extenders. Extenders allow developers to create new data types that are based on existing character or numeric types or one of DB2's large binary object types. DB2 Version 2 provided support for images, text, and video using large objects. Now IBM is working with partners to develop DB2 Extenders for other key application areas.

Sybase is taking a different approach. It's developing servers for different application areas and tying them together with its OpenServer interface so that they appear to be in a single SQL Server database.

Oracle intends to support complex data types in Oracle8 using "cartr idges" within its Network Computing Architecture (NCA). Oracle defines a cartridge as "a manageable object that provides extensible functionality." Cartridges will use a language-neutral interface to interact with other objects. NCA will include a "software object bus" to provide an interconnect layer linking cartridges to clients, servers, and network services.

DataCartridges will be one type of cartridge offered within NCA. Each DataCartridge will implement a specific data type, including its structure and methods for creation, search, display, etc. DataCartridges will be less tightly integrated with the RDBMS engine than are the Informix DataBlades. Oracle's approach will probably trade performance for stability. The cartridges' indirect connection to the database server will prevent disruption of RDBMS operation due to errors in cartridge code. Thus, if the cartridge is flawed, your server doesn't crash. With Informix's DataBlades, a system-wide failure may occur. However, the software interconnect in the Oracle approach may add performance overhead.

The Future of ORDBMSes

Object-relational databases offer many advantages over traditional RDBMSes and OODBMSes. Most OODBMSes provide storage capabilities for persistent objects. As such, OODBMSes offer name- or key-based direct access to complex data structures. However, OODBMSes are unsuited to the types of content-based access required for query and analysis.

RDBMSes provide ideal platforms for content-based retrieval and analysis. The logical model underlying RDBMSes naturally supports referential integrity; the relational engines can also accommodate rule-based processing, such as triggers and alerts. RDBMSes have matured to encompass necessary performance and control features, such as query optimization, data security, and backup and recovery.

ORDBMSes combine the best of both breeds. They provide general access based on content as well as direct access based on unique identifiers. They also p rovide the ease of use and data independence that are the hallmarks of a traditional RDBMS. Through object extensions they can also provide the rich data types, reusability, and extensibility commonly associated with object-oriented applications.

Do ORDBMSes represent the future of data management? Or are they just a stepping stone to something else? I believe it is the latter. The variety of vendor solutions for extending relational database systems is a response to user requirements. In the future, the traditional boundaries between applications and computing platforms will be erased, or at least abstracted. This means we'll view and manage the environment as a whole so that an organization could, for the first time, control and gain value from all its business information regardless of form or location. Object-relational databases, at least as they are envisioned today, are just a first step in the evolution toward this goal.


Where to Find


Cognos 

Burlington, MA 
Phone:    (617) 229-6600
Internet: 
http://www.cognos.com


IBM

Armonk, NY 
Phone:    (914) 765-1900
Internet: 
http://www.ibm.com


Informix Software

Menlo Park, CA 
Phone:    (415) 926-6300
Internet: 
http://www.informix.com


Oracle 

Redwood Shores, CA 
Phone:    (415) 506-7000
Internet: 
http://www.oracle.com


Sybase

Emeryville, CA 
Phone:    (510) 922-3500
Internet: 
http://www.sybase.com


HotBYTEs
 - information on products covered or advertised in BYTE


More Ways to Extend RDBMSes

Informix, with its DataBlade technology, has competition
in extending RDBMSes to handle complex data.


Company
       
Strategy


IBM           DB2 Extenders support text, images, and video;
              can create new data types using DB2 large 
              binary object ty
pes. Architecturally similar
              to DataBlades.

Oracle        DataCartridges and a "software object bus" 
              create and deploy complex data types. 
              Cartridges are less tightly integrated to 
              the RDBMS than are DataBlades.

Sybase        Uses OpenServer interface to make series of 
              application-specific servers appear as a 
              single SQL server.



Complex Data Types


Application Area
                
Data Type


Trend analysis (financial,      Time series
  marketing, etc.)

Computer-aided design           Design renderings, and manufacturing
                                blueprints

Multimedia applications         Video, audio, images

Multidimensional analysis       2-D, 3-D coordinate systems

Geographic applications         Geophysical systems (latitude, longitude,
                                altitude)

Text processing
                 Document, message

Law enforcement                 Fingerprints, pedigree

Chemistry/biology               Chemical structures (molecules, compounds)

Office automation               File systems (word processing, spread-
                                sheets, diagrams, etc.), directory 
                                structures, e-mail

Intranet/Internet               Web pages




Eight Reasons Why RDBMSes Need a Make-Over


1.
 Support for client/server computing

2.
 Support for complex data types

3.
 Global optimization of queries

4.
 Improved access to special data types

5.
 Support for object-oriented constructs (inheritance,
   polymorphism)

6.
 "Wrapper" integration of heterogeneous environments

7.
 Support for distributed computing middleware -- Distributed Computing
   Environment (DCE), Common Object Request
 Broker Architecture (CORBA),
   Distributed Common Object Model (DCOM) -- and management services

8.
 Support for Internet/intranet applications



DataBlade Aid

illustration_link (53 Kbytes)

The Informix RDBMS can call data structures and functions within the DataBlade to handle complex data at various processing stages.


Jay-Louise Weldon ( jweldon@shl.com ) heads the Data Warehouse Practice within the U.S. Eastern Region of MCI Systemhouse, a global systems-integration firm.

Up to the Features section contentsGo to previous article: Go to next article: Opening Doors to Complex DataSearchSend a comment on this articleSubscribe to BYTE or BYTE on CD-ROM  
Flexible C++
Matthew Wilson
My approach to software engineering is far more pragmatic than it is theoretical--and no language better exemplifies this than C++.

more...

BYTE Digest

BYTE Digest editors every month analyze and evaluate the best articles from Information Week, EE Times, Dr. Dobb's Journal, Network Computing, Sys Admin, and dozens of other CMP publications—bringing you critical news and information about wireless communication, computer security, software development, embedded systems, and more!

Find out more

BYTE.com Store

BYTE CD-ROM
NOW, on one CD-ROM, you can instantly access more than 8 years of BYTE.
 
The Best of BYTE Volume 1: Programming Languages
The Best of BYTE
Volume 1: Programming Languages
In this issue of Best of BYTE, we bring together some of the leading programming language designers and implementors...

Copyright © 2005 CMP Media LLC, Privacy Policy, Your California Privacy rights, Terms of Service
Site comments: webmaster@byte.com
SDMG Web Sites: BYTE.com, C/C++ Users Journal, Dr. Dobb's Journal, MSDN Magazine, New Architect, SD Expo, SD Magazine, Sys Admin, The Perl Journal, UnixReview.com, Windows Developer Network