ideo and audio objects, for example, require play and rewind capabilities. In essence, complex data types are best represented as objects. They encapsulate the details of data, structures, and methods regarding how the object interacts with other objects.
To be effective, RDBMS technology must support all traditional relational functions as well as data objects. Said another way: RDBMSes must embrace some characteristics of object-oriented database management systems (OODBMSes) to become object-relational database management systems (ORDBMSes).
The major DBMS vendors, including IBM, Informix, Oracle, and Sybase, have delivered or are about to deliver technologies that push RDBMS technology into the world of objects. But the question arises: Are these bolt-on fixes that you can trust for the long ter
m? Or should you wait for a new generation of DBMSes built from the ground up to handle complex data types? We'll explore these questions in this article and in the one that follows. Here, we will look at the capabilities of current ORDBMS technologies; the article "How to Improve RDBMSes" will peer into the future of DBMS design.
Object Advantages
With the exception of binary codes, all data processed by computers is complex data. Normally we think of characters and numbers (either integer or real) as basic data types since most DBMSes and programming languages recognize and handle them. Using a data type known to the DBMS or language frees programmers from managing the details of storing and processing the data. For example, when using numbers declared as real (often stored as a sign, a base value, and an exponent value), a programmer can specify numeric calculations (TAX = REVENUE * RATE) and rely on the DBMS or program code to properly handle the component parts of the factors and
create and store the components of the result.
ORDBMSes support all the traditional relational functions as well as supporting data types other than the usual characters and numbers. Thus ORDBMSes are attractive because they let applications and processing styles that have traditionally used proprietary data management techniques become an integral part of the enterprise database. This offers the possibility of consistently managing heterogeneous business objects across multiple platforms, and it promises a common interface for managing all types of data. The result: simplified user requests and application code.
As business users increasingly recognize the value of managing collections of heterogeneous data objects, ORDBMSes may become an essential part of DBMSes. The big challenge now is extending the capabilities of existing RDBMSes to become object-aware.
There are three capabilities RDBMSes must have in order to efficiently handle complex data types and objects. First, they must have s
torage and indexing techniques customized to each data structure. For example, methods that understand the structure of a fingerprint data type will be able to store, retrieve, and query it more efficiently than those that treat it as a binary large object (BLOB).
Second, content-based retrieval requires special methods. Image, audio, and video data will need the equivalent of text search engines that now manage documents (see the sidebar "Opening Doors to Complex Data").
Third, to deliver peak performance for search and retrieval of complex data, query optimization and the retrieval process itself must be customized to the type of data being retrieved. The cost -- in terms of system resources -- of complex queries for retrieving CAD/CAM designs is quite different from that for retrieving selected rows from a relational table. This cost difference increases in importance if the data objects you're searching for exist in a distributed computing environment.
Creating a DataBlade
While each of the DBMS vendors is tackling these issues,
Informix has
the best-known answer, thanks to the publicity surrounding its purchase of the Illustra ORDBMS. This technology uses DataBlades -- application-focused collections of data structures and code -- to implement complex data types and objects into an extended relational database environment.
To add a new data type to a relational database, you must create both data structures and functions. The data structures must include the external representation and the internal format of the new data type. The external representation describes how values of the new data type will be displayed and also how values of the new type will be deciphered when presented as input. The internal representation describes how these values will be maintained in memory for processing. Each new data type must be supported by code that converts its external representation to its internal representation (and vice versa) as a way of creating
and presenting instances of the data object. With DataBlades, you can use any of the basic data types (integer, character, etc.) and standard constructs (arrays, sets, lists, etc.) to create new data types. You can also develop data types that are variants of preexisting ones supported by the same or a different DataBlade.
A DataBlade must also include various functions for each new data type. At a minimum it must include functions to store, retrieve, display, modify, and query instances of this new type. The storage and retrieval functions can rely on standard access methods for the base data types used by the new type, or the DataBlade can include access methods that are specific to the new type. Modify functions include any operators (such as arithmetic, string, or specialized operations) that change the value of an instance of the new type. For example, to create a lag in a time series, you might modify an existing time series by shifting back the value for each period by one or more periods. Query
functions include variations of the standard comparative operations (e.g., equality, less than, greater than, like) and possibly other special functions, such as "distance" for geographic data types. The DataBlade can also contain functions that support conversion from one data type to another (called
casts
). For example, a function might convert a text document into a standard relational table consisting of a line number and a text string for each line in the document.
Each DataBlade also contains metadata on the resource cost of the various functions as applied to each complex data type. The Informix RDBMS engine uses this information to perform global optimization of queries that include several different types of data.
The DataBlade model supports the addition of objects as well. The DataBlade for an object would include its data structure (possibly a complex structure of other data types and even other objects) as well as the methods that implement the behavior of the object type. Th
e model also can handle standard object-oriented constructs, such as inheritance and polymorphism. An object or data type can inherit properties or methods from other objects or types, and different objects or types can implement a function or an operator with the same name in different ways. For example, "distance" for geographic points can be different than "distance" for points in 3-D drawings.
Implementing DataBlades
A shared metadata repository and dynamically linked libraries of function code make the data types implemented through a DataBlade accessible to the Informix RDBMS. The RDBMS engine uses the data structures and functions defined by the DataBlade at various stages of normal processing. You can write DataBlade functions in either SQL, C, or C++ using a proprietary API to the database server kernel. Informix also plans to add support for other languages, including Java. The SQL statements CREATE TYPE and CREATE FUNCTION register DataBlade data types and functions with th
e Informix RDBMS server. The files containing DataBlade data and code are part of the link step when building the RDBMS server.
DataBlade functions in SQL execute like macros as part of the SQL statement processing. Functions written in C are compiled, stored as an executable, filed, and loaded dynamically during query processing, as needed. This dynamic binding insulates application code from the function implementation and allows the implementation to change without affecting the application code.
A DataBlade can also incorporate remote procedure calls that tie an external system into the DBMS. You can use this approach to integrate data stored in heterogeneous DBMSes and file systems as well as across distributed platforms.
Impact on Developers
The
extensibility
provided by this model can be a boon to application developers. In-house programmers can eliminate the need to manage complex data types from application code. This yields leaner code, whic
h can be produced more quickly and can remain unchanged as the data management details change. Third-party developers offer DataBlades in specialty areas (time series, geographic data, test processing, and so on) that in-house developers can use without the need for further programming. Or, if the application demands, in-house developers can create new data types that are variants of those provided, relying on inheritance for most of the functionality.
Informix offers a foundation DataBlade with support for over 40 different data types. The company also provides a wide variety of third-party DataBlades.
The main challenges for developers are the learning curve required to understand the object-relational framework and being able to develop code within the constraints of the API. As with all object-oriented frameworks, maximum reusability relies on making good choices for basic functions, out of which more complex functions can be built. Portability across platforms is also an issue since DataBlade
access methods can be very platform-specific. And lastly, the interoperability of functions and data types among different DataBlades can be problematic, especially since there are no standards within applications as yet.
Other Approaches
Informix is not the only one providing RDBMS object-relational capabilities. IBM is following a similar architectural approach with its DB2 Extenders. Extenders allow developers to create new data types that are based on existing character or numeric types or one of DB2's large binary object types. DB2 Version 2 provided support for images, text, and video using large objects. Now IBM is working with partners to develop DB2 Extenders for other key application areas.
Sybase is taking a different approach. It's developing servers for different application areas and tying them together with its OpenServer interface so that they appear to be in a single SQL Server database.
Oracle intends to support complex data types in Oracle8 using "cartr
idges" within its Network Computing Architecture (NCA). Oracle defines a cartridge as "a manageable object that provides extensible functionality." Cartridges will use a language-neutral interface to interact with other objects. NCA will include a "software object bus" to provide an interconnect layer linking cartridges to clients, servers, and network services.
DataCartridges will be one type of cartridge offered within NCA. Each DataCartridge will implement a specific data type, including its structure and methods for creation, search, display, etc. DataCartridges will be less tightly integrated with the RDBMS engine than are the Informix DataBlades. Oracle's approach will probably trade performance for stability. The cartridges' indirect connection to the database server will prevent disruption of RDBMS operation due to errors in cartridge code. Thus, if the cartridge is flawed, your server doesn't crash. With Informix's DataBlades, a system-wide failure may occur. However, the software interconnect
in the Oracle approach may add performance overhead.
The Future of ORDBMSes
Object-relational databases offer
many advantages
over traditional RDBMSes and OODBMSes. Most OODBMSes provide storage capabilities for persistent objects. As such, OODBMSes offer name- or key-based direct access to complex data structures. However, OODBMSes are unsuited to the types of content-based access required for query and analysis.
RDBMSes provide ideal platforms for content-based retrieval and analysis. The logical model underlying RDBMSes naturally supports referential integrity; the relational engines can also accommodate rule-based processing, such as triggers and alerts. RDBMSes have matured to encompass necessary performance and control features, such as query optimization, data security, and backup and recovery.
ORDBMSes combine the best of both breeds. They provide general access based on content as well as direct access based on unique identifiers. They also p
rovide the ease of use and data independence that are the hallmarks of a traditional RDBMS. Through object extensions they can also provide the rich data types, reusability, and extensibility commonly associated with object-oriented applications.
Do ORDBMSes represent the future of data management? Or are they just a stepping stone to something else? I believe it is the latter. The variety of vendor solutions for extending relational database systems is a response to user requirements. In the future, the traditional boundaries between applications and computing platforms will be erased, or at least abstracted. This means we'll view and manage the environment as a whole so that an organization could, for the first time, control and gain value from all its business information regardless of form or location. Object-relational databases, at least as they are envisioned today, are just a first step in the evolution toward this goal.
Where to Find
Cognos
Burlington, MA
Phone: (617) 229-6600
Internet:
http://www.cognos.com