.
Regardless of whether data flows from operational systems to a data-warehousing (DW) system, or from older operational systems to reengineered or downsized ones, the requirement is the same: A tool is needed to capture, clean, transform, and integrate data while at the same time handle the volume and complexity of the disparate data sources and targets involved. Such a tool is used with other software products and processes to form a Data Transformation Management System (DTMS). But before we can understand the DTMS's function, we must first understand how a business system works.
On the Line
Corporate line-of-business (LOB) systems manage the day-to-day business operations of an organization: o
rder entry, inventory management, shipping, invoicing, and so forth. These systems are designed with performance, rather than the business user, in mind. This is why us-ers have never held LOB decision-support systems in high regard; they don't provide the consistent, easy-to-use information needed to analyze business operations or to make business decisions.
A DW system helps solve the deficiencies of LOB-based decision-support systems. A DW's objective is to capture corporate data from operational systems, clean it, and transform it into a consistent form that's understandable and has business context for the end user.
There are several types of DWs. One is the
enterprise data warehouse
(EDW), which contains detailed (and possibly summarized) data that's captured from one or more LOB systems and integrated into a separate subject-oriented database. Data flowing into an EDW does not replace existing data; instead, it accumulates to show a historical record of business operations. This allo
ws analysis of business trends over a period of time. This style of warehouse is used for short- and long-term business planning and decision-making covering multiple business units.
An
operational data store
(ODS) presents a subject-oriented, integrated, consistent picture of current data stored in LOB databases. As data is modified in operational systems, a copy of the changed data flows into the ODS. The copy updates existing data to reflect the current status of the operational system. Updates to the ODS typically occur less than 24 hours after updates to the LOB systems are completed. Unlike an EDW, an ODS does not contain summarized or historical data. Since it's used for the day-to-day management of business operations, an ODS can be thought of as an operational data warehouse.
A
data mart
contains a subset of corporate data that's of value to a specific business unit, department, or set of users. This subset consists of historical, summarized, and possibly detailed data captu
red from LOB systems (independent data marts) or from an EDW (a dependent data mart). Like an EDW, a data mart is used for short- and long-term business planning and decision-making; unlike an EDW, a data mart does not provide the capability to analyze data across multiple business units of an organization.
A
Web information system
integrates data from LOB, DW, and groupware systems (e.g., Lotus Notes and Microsoft Office) with data stored on Web servers connected to a corporate intranet and the public Internet. Such a system extends the notion of a DW system to include all types of data that's of interest to a business user, making it accessible from Web-based desktop and network computers.
Building a DTMS
A DTMS acts as a data-transformation hub: The hub's center transforms and cleans data, while its spokes capture raw source data and transport the resulting transformed data to the target systems. The main components of such a system are shown
in the figure
.
Many data-transformation products have a
capture component
that directly accesses the source system using data-manipulation language statements that extract all (or a subset of) the data. But any given product can't possibly support all of the hundreds of database and file formats that exist in operational systems. With an ODS, for example, it's usually impractical to reload the complete target database each time; thus, change-data-capture methods are often used instead. (That is, the capture method operates only when a change occurs in the data, and only on the data that has changed.)
As a result, additional data-capture routines might have to be used in some situations to create flat files that can be processed by a transformation tool. If data needs to be cleansed, then appropriate procedures and products need to be identified to do the work.
A
transformation component
formats and integrates data for a target system. This might involve the restructuring of the source data,
including files, records, and fields, and the removal of data that the target system doesn't require. It may also involve enhancing the data by decoding and translating field values, adding a time stamp (if one is missing) to reflect the currency of data, data summarization, and the calculation of derived values. Many products on the market can do data transformation and integration.
Managing Complexity
The level of complexity that a DTMS has to deal with varies, depending on the type of data system being managed. Moving data between operational systems involves a high level of data transformation and a large number of interfaces (i.e., data sources and targets). An ODS/EDW combination also has a high data-transformation requirement, but it involves fewer interfaces. Data marts, on the other hand, assuming they are built directly from an EDW, have few interfaces and a lower level of data transformation.
Many DTMS products and applications stream data from a source system to a target syste
m; as records are captured, they are transformed and loaded into the target system. But as the number of sources increases, the streaming approach is inappropriate because it becomes difficult to coordinate the capture of source data and to handle error recovery. Also, streaming approaches encourage the development of point-to-point solutions where data is moved from a single source database to a single target database, which leads to data-consistency problems.
The solution to these issues is to implement a
transport component
that routes captured data into a staging area and passes it to the transformation process only when all the required data has been captured. Similarly, the transport component holds output from the transformation process in a staging area until all the data has been transformed and is error-free.
A
load component
performs any last data formatting for the target system. It might also do routing functions that deliver data to one or more target systems.
Pl
anning a Course
The main objective of a DTMS is to manage the flow of data from a source system to a target system. When choosing DTMS products, it's important to realize that the transformation power and interface management provided by a DTMS directly affect data quality and the ability of the product to adjust to support the needs of a corporate information system.
As complexity increases, DTMS tools are also required for configuration management, security and auditing, monitoring and tuning, transformation scheduling and work flow, and error recovery. To deploy high-performance, scalable business information systems, organizations must develop a sound DTMS architecture that supports not only existing LOB systems but the new multimedia data types that will appear in the future.
illustration_link (26 Kbytes)

Information systems obtain data from various sources and repackage it for different uses.
illustration_link (28 Kbytes)

A DTMS is responsible for acquiring, synchronizing, cleansing, and distributing business-critical data.
Colin White (Morgan Hill, CA) is president of DataBase Associates International. He specializes in distributed computing, data warehousing
, and the corporate use of intranets and the Web. You can reach him by sending e-mail to
cwhite@dbaint.com
.