Archives
 
 
 
  Special
 
 
 
  About Us
 
 
 

Newsletter
Free E-mail Newsletter from BYTE.com

 
    
           
Visit the home page Browse the four-year online archive Download platform-neutral CPU/FPU benchmarks Find information for advertisers, authors, vendors, subscribers Request free information on products written about or advertised in BYTE Submit a press release, or scan recent announcements Talk with BYTE's staff and readers about products and technologies

ArticlesManaging Data Transformations


December 1997 / Core Technologies / Managing Data Transformations

Making corporate data useful for decisions requires a processing hub that coordinates and controls data flows.

Colin White

Organizations are generating and gathering more data than ever before. The increasing use of the Internet and corporate intranets is also adding to the mountain of available business data. But more data does not necessarily mean better information or more informed business decisions. Often data accumulates so rapidly that organizations have difficulty managing its accuracy and quali ty.

For corporate decision-making, business users need access to clean and consistent data. The best solution is to build a business information system that provides high-quality, consistent data. Its key parts are shown in the figure .

Regardless of whether data flows from operational systems to a data-warehousing (DW) system, or from older operational systems to reengineered or downsized ones, the requirement is the same: A tool is needed to capture, clean, transform, and integrate data while at the same time handle the volume and complexity of the disparate data sources and targets involved. Such a tool is used with other software products and processes to form a Data Transformation Management System (DTMS). But before we can understand the DTMS's function, we must first understand how a business system works.

On the Line

Corporate line-of-business (LOB) systems manage the day-to-day business operations of an organization: o rder entry, inventory management, shipping, invoicing, and so forth. These systems are designed with performance, rather than the business user, in mind. This is why us-ers have never held LOB decision-support systems in high regard; they don't provide the consistent, easy-to-use information needed to analyze business operations or to make business decisions.

A DW system helps solve the deficiencies of LOB-based decision-support systems. A DW's objective is to capture corporate data from operational systems, clean it, and transform it into a consistent form that's understandable and has business context for the end user.

There are several types of DWs. One is the enterprise data warehouse (EDW), which contains detailed (and possibly summarized) data that's captured from one or more LOB systems and integrated into a separate subject-oriented database. Data flowing into an EDW does not replace existing data; instead, it accumulates to show a historical record of business operations. This allo ws analysis of business trends over a period of time. This style of warehouse is used for short- and long-term business planning and decision-making covering multiple business units.

An operational data store (ODS) presents a subject-oriented, integrated, consistent picture of current data stored in LOB databases. As data is modified in operational systems, a copy of the changed data flows into the ODS. The copy updates existing data to reflect the current status of the operational system. Updates to the ODS typically occur less than 24 hours after updates to the LOB systems are completed. Unlike an EDW, an ODS does not contain summarized or historical data. Since it's used for the day-to-day management of business operations, an ODS can be thought of as an operational data warehouse.

A data mart contains a subset of corporate data that's of value to a specific business unit, department, or set of users. This subset consists of historical, summarized, and possibly detailed data captu red from LOB systems (independent data marts) or from an EDW (a dependent data mart). Like an EDW, a data mart is used for short- and long-term business planning and decision-making; unlike an EDW, a data mart does not provide the capability to analyze data across multiple business units of an organization.

A Web information system integrates data from LOB, DW, and groupware systems (e.g., Lotus Notes and Microsoft Office) with data stored on Web servers connected to a corporate intranet and the public Internet. Such a system extends the notion of a DW system to include all types of data that's of interest to a business user, making it accessible from Web-based desktop and network computers.

Building a DTMS

A DTMS acts as a data-transformation hub: The hub's center transforms and cleans data, while its spokes capture raw source data and transport the resulting transformed data to the target systems. The main components of such a system are shown in the figure .

Many data-transformation products have a capture component that directly accesses the source system using data-manipulation language statements that extract all (or a subset of) the data. But any given product can't possibly support all of the hundreds of database and file formats that exist in operational systems. With an ODS, for example, it's usually impractical to reload the complete target database each time; thus, change-data-capture methods are often used instead. (That is, the capture method operates only when a change occurs in the data, and only on the data that has changed.)

As a result, additional data-capture routines might have to be used in some situations to create flat files that can be processed by a transformation tool. If data needs to be cleansed, then appropriate procedures and products need to be identified to do the work.

A transformation component formats and integrates data for a target system. This might involve the restructuring of the source data, including files, records, and fields, and the removal of data that the target system doesn't require. It may also involve enhancing the data by decoding and translating field values, adding a time stamp (if one is missing) to reflect the currency of data, data summarization, and the calculation of derived values. Many products on the market can do data transformation and integration.

Managing Complexity

The level of complexity that a DTMS has to deal with varies, depending on the type of data system being managed. Moving data between operational systems involves a high level of data transformation and a large number of interfaces (i.e., data sources and targets). An ODS/EDW combination also has a high data-transformation requirement, but it involves fewer interfaces. Data marts, on the other hand, assuming they are built directly from an EDW, have few interfaces and a lower level of data transformation.

Many DTMS products and applications stream data from a source system to a target syste m; as records are captured, they are transformed and loaded into the target system. But as the number of sources increases, the streaming approach is inappropriate because it becomes difficult to coordinate the capture of source data and to handle error recovery. Also, streaming approaches encourage the development of point-to-point solutions where data is moved from a single source database to a single target database, which leads to data-consistency problems.

The solution to these issues is to implement a transport component that routes captured data into a staging area and passes it to the transformation process only when all the required data has been captured. Similarly, the transport component holds output from the transformation process in a staging area until all the data has been transformed and is error-free.

A load component performs any last data formatting for the target system. It might also do routing functions that deliver data to one or more target systems.

Pl anning a Course

The main objective of a DTMS is to manage the flow of data from a source system to a target system. When choosing DTMS products, it's important to realize that the transformation power and interface management provided by a DTMS directly affect data quality and the ability of the product to adjust to support the needs of a corporate information system.

As complexity increases, DTMS tools are also required for configuration management, security and auditing, monitoring and tuning, transformation scheduling and work flow, and error recovery. To deploy high-performance, scalable business information systems, organizations must develop a sound DTMS architecture that supports not only existing LOB systems but the new multimedia data types that will appear in the future.


Parts of a Business Information System

illustration_link (26 Kbytes)

Information systems obtain data from various sources and repackage it for different uses.


Parts of a Data Transformation Management System

illustration_link (28 Kbytes)

A DTMS is responsible for acquiring, synchronizing, cleansing, and distributing business-critical data.


Colin White (Morgan Hill, CA) is president of DataBase Associates International. He specializes in distributed computing, data warehousing , and the corporate use of intranets and the Web. You can reach him by sending e-mail to cwhite@dbaint.com .

Up to the Core Technologies section contentsGo to next article: Speech Enables the Common Desktop PCSearchSend a comment on this articleSubscribe to BYTE or BYTE on CD-ROM  
Flexible C++
Matthew Wilson
My approach to software engineering is far more pragmatic than it is theoretical--and no language better exemplifies this than C++.

more...

BYTE Digest

BYTE Digest editors every month analyze and evaluate the best articles from Information Week, EE Times, Dr. Dobb's Journal, Network Computing, Sys Admin, and dozens of other CMP publications—bringing you critical news and information about wireless communication, computer security, software development, embedded systems, and more!

Find out more

BYTE.com Store

BYTE CD-ROM
NOW, on one CD-ROM, you can instantly access more than 8 years of BYTE.
 
The Best of BYTE Volume 1: Programming Languages
The Best of BYTE
Volume 1: Programming Languages
In this issue of Best of BYTE, we bring together some of the leading programming language designers and implementors...

Copyright © 2005 CMP Media LLC, Privacy Policy, Your California Privacy rights, Terms of Service
Site comments: webmaster@byte.com
SDMG Web Sites: BYTE.com, C/C++ Users Journal, Dr. Dobb's Journal, MSDN Magazine, New Architect, SD Expo, SD Magazine, Sys Admin, The Perl Journal, UnixReview.com, Windows Developer Network