Archives
 
 
 
  Special
 
 
 
  About Us
 
 
 

Newsletter
Free E-mail Newsletter from BYTE.com

 
    
           
Visit the home page Browse the four-year online archive Download platform-neutral CPU/FPU benchmarks Find information for advertisers, authors, vendors, subscribers Request free information on products written about or advertised in BYTE Submit a press release, or scan recent announcements Talk with BYTE's staff and readers about products and technologies

ArticlesDimensions of Data


April 19 95 / Special Report / Dimensions of Data

Multidimensional analysis in a client/server world

Edmund X. Dejesus

Beth Ceja tracks statistical safety and accident information for over 17,000 employees at Southern California Edison. For two years, her tool of choice has been the multidimensional analysis package CrossTarget from Dimensional Insight (Burlington, MA). Reports come in from over 20 locations to a mainframe running DB2. Then they stage to the server-based multidimensional store that she analyzes using CrossTarget. Because such tools can rapidly "slice and dice" complex data in ways that are difficult or impossible for conventional SQL-oriented tools, they're an increasingly popular way to analyze raw numbers.

Data sources can include the following: mainframe databases, client/server relational databases, spreadsheets, data warehouses, t ext reports, flat files, and proprietary systems. Multidimensional tools can access file-based extracts, act as clients to SQL servers, or talk to specialized multidimensional servers. These OLAP (on-line analytical processing) servers and their clients are an important new flavor of client/server computing.

Flatland

Flat files, typical of legacy systems, are the most common data source. These files can be staged to a multidimensional structure on the mainframe itself, to a SQL server, or to a multidimensional server. Whatever the destination, a metadata layer (i.e., a data dictionary) is usually involved. Metadata describes the source, format, and layout of the extracted data.

Multidimensional products vary in their support for defining metadata. Some offer menu-driven or drag-and-drop interfaces to simplify the process ( see the screen ); others require considerable knowledge of source structures and transfer processes. Once in place, though, metadata remains fairly static.

Many vendors emphasize their products' ability to handle flat-file data, which ensures access to legacy data on mainframes. The flat-file format may also be the only option when transferring data between source and destination components that share no other compatible data formats.

Accessing Relational Data

It may not be necessary (or desirable) to replicate relational data to a natively multidimensional database. However, multidimensional tools that reach directly into SQL stores must create SQL statements to extract the data they analyze. As we saw with flat files, some products offer menu-driven or drag-and-drop interfaces to simplify the process. These interfaces enable non-SQL programmers to perform multidimensional analysis of SQL data. Because relational databases represent multidimensional reality the way a set of 2-D blueprints represents a house, defining metadata for access to relational data can be complex.

Relational database tabl es contain records (or rows). Each record consists of fields (or columns). In a normal relational database, one field in each record is the primary key, a unique identifier for each record. In contrast, the multidimensional data model is an n-dimensional array (sometimes called a Hypercube). Each dimension has an associated hierarchy of levels of consolidation of data. For instance, a time dimension might have a hierarchy with levels such as months, quarters, and fiscal years. Each location in a multidimensional array, corresponding to an intersection of all the dimensions, is a cell.

Variables (which are also known as measures or metrics) in a multidimensional array correspond to columns in a relational database table. Values within a table column correspond to values for that variable in a multidimensional array. Virtual variables (which are also known as off-page variables) in a multidimensional array are values that are not stored but calculated on the fly from stored variable values. An example of this might be variance, which is calculated from actual and predicted data. Multidimensional tools can manipulate and display virtual variables just as they do stored variable values.

Metadata maps the columns and rows of a relational database to the dimensions and cells of a multidimensional array. Metadata may also include rules for the consolidation of data at each level of each dimensional hierarchy. Variations on a time dimension, for example, might be

days ==> weeks

days ==> months ==> calendar years

days ==> quarters ==> fiscal years

where data consolidates at each level of these hierarchies.

Multidimensional tools can access relational database data via SQL calls, possibly using multidimensional extensions to SQL that can, for example, treat columns as subscript-accessible arrays. The advantage of this approach is that multidimensional and relational tools can share a common relational data store.

However, there are also drawbacks. A S QL store is not a Hypercube. Instead, it's a set of related tables, and slice-and-dice analysis ("show me sales by region by month by salesperson") can be slow going, even if you've staged the data from the transactional server to a dedicated analysis server. Products that access relational data in this way include Brio Technology's DataPivot (used with its Data-Prism), Cognos's PowerPlay, MicroStrategy's DSS Agent, and Trinzic's Forest & Trees.

Information Advantage has staked out a position firmly in the middle. The company's product called Decision Support Suite is an integrated set of applications to perform multidimensional analysis dynamically on a data warehouse and act as a server to a variety of front-end tools in an open three-tier client/server architecture. Decision Support Suite is therefore a multidimensional analysis engine, accessing nonmultidimensional data and providing multidimensional information to client applications. This middleware approach does not dictate the form of the origi nal data, and it allows access by a number of front-end tools.

Importing and Transforming Relational Data

Instead of reaching into SQL stores directly, some tools import relational data and transform it into a multidimensional array. It's a four-step process:

1. Establish the number of dimensions. Usually, it's the same as the number of keys in the relational database.

2. Traverse the key relations. Each logical group or collection represents a dimension in the multidimensional database. The multidimensional tool thus organizes the data into hierarchies.

3. Consolidate the relational database columns to appropriate levels. For example, data that is dimensioned by location may be consolidated into a hierarchy of geographical levels--district, state, and country.

4. Design the calculated data. Most tools let you design computed variables on the fly.

The resulting multidimensional array may differ significantly from the source relational database. Extensive calculations, consolidations, comparisons, sorts, and selections can occur automatically during the transformation. If the relational database stores current operating data and historical data, the corresponding multidimensional array might create, calculate, and store future or projected data.

Because the data dictionary defines sources of data, dimensions and variables, dimensional attributes and their hierarchies, and formulas to calculate virtual variables or handle time series, it's a valuable by-product of transformation. This enhanced view of the raw data can greatly simplify conventional client/server applications development.

Going Native

Products that implement specialized multidimensional servers include Arbor Software's Essbase, Comshare's Commander OLAP Server, Holistic Systems' Holos, IRI Software's Express, Kenan Technologies' Acumate, Pilot Software's Lightship, and Planning Science's Gentium. These servers let users explore up to gigabytes of dat a, interactively teasing out meaning and insight.

Products that construct file-based multidimensional data stores include CrossTarget and Andyne Computing's Pablo. The vendors of such products claim greater portability of the data store (e.g., for use in mobile-computing situations), because the client/server architecture is not required to access the data.

Whether they're server- or file-based, multidimensional stores work with tons of data. How do they manage it? Sparse arrays are one key optimization. An array is sparse when few cells are populated or when many cells repeat the same value. Either way, the array can be compressed to cut down on storage and boost performance. Kenan's Acumate can be configured for different sparsity tolerances, and it splits the data into header and array blocks to streamline access (see the figure " Structure of a Multidimensional Data Store ").

Arbor Software has patented a way to dynamically configure the dimensionality of the base unit of storage in Essbase to optimize its performance. This technique improves consolidation speed and memory utilization. It also cuts query response time and disk-storage requirements.

Predefining data location and format with metadata is also the key to rapid access. Some products, including Acumate, provide direct cell-level access, which allows high-speed loading, updating, and retrieving of large quantities of array data. On suitable hardware, Kenan's ALI (Array Level Interface) can perform thousands of cell accesses per second, on both sparse and dense arrays. Such speed is necessary when populating a multidimensional database with gigabytes of information, even in an overnight process. These services are accessed through a C API that allows direct data exchange between application programs and the array.

Alternative Architectures

IRI Software's Express runs with the data engine and data dictionary residing on the server, while analytical screens and queries run o n the client. A single database can be distributed among several locations.

Comshare's Commander OLAP Server includes a data refinery for data extraction, cleanup, and transport from legacy data sources. Part of the data refinery sits next to the source data on Windows NT, OS/2, VM, MVS, VMS, and Unix systems. Only the required data is extracted and moved, which cuts down on processing and improves performance.

Kenan's FocalPoint tool creates and manages high-level data links that afford "virtual" access. When you request data, FocalPoint checks to see if the data is in the multidimensional array. If this is not the case, it generates the code to access a relational database (using SQL) or a flat file (using file read commands). You don't have to know where the data is located or in what format.

FocalPoint loads data automatically at prearranged times, either statically (in batch mode, periodically) or dynamically (by user request to support on-the-fly drill-downs). The metadata defines t he multidimensional data model, the locations of external data sources, the mappings between data sources and the array, and the distribution of data between sources and the array.

Multidimensional applications typically require access to multiple sources of data. Gauging the effect of sales promotions on sales volume, for example, may require tapping into separate advertising and sales databases.

Along with client/server support, multidimensional tools should handle local data storage and manipulation, so users can analyze data subsets off-line.

Open Access

Not everyone is in favor of multidimensional servers. Users rightly worry about proprietary access, the lack of tuning and maintenance tools, and the inability to update data incrementally. Many users will thus continue to rely on conventional relational systems for transaction-oriented work and supplement these with multidimensional systems for querying, reporting, and analysis. In fact, the Gartner Group predi cts that natively multidimensional databases will fall behind products that optimize SQL-oriented databases for multidimensional analysis.

Vendors of multidimensional solutions are striving to make their products more open and accessible, embracing technologies such as ODBC (Open Database Connectivity), OLE, and DCE (Distributed Computing Environment). By publishing their own APIs, as Arbor Software has done with Essbase, vendors hope to attract third-party support and establish de facto standards in an otherwise standardless field.

Vendors increasingly stress the ability to access relational data on the back end and to display it in spreadsheets on the front end. Further, they try to serve up a variety of back-end data sources to a variety of front-end tools. Use of an existing client leverages the buyer's investment in software and the user's often considerable experience with the tool he or she knows best, so an architecture open to front-end clients is preferred.

The data dictionary i s the crucial link between conventional data stores and multidimensional analysis tools. Most of the latter can already reach through and directly access original data sources. These capabilities will probably expand in the future. Some people predict the emergence of a standard MDQL (multidimensional query language).

Complementary Solutions

There's a trend toward partnership in the multidimensional camp. Comshare has licensed the Essbase engine from Arbor as a server component of Commander EIS. Microsoft Excel is strongly linked with Essbase. And Cognos's PowerPlay and Essbase are merging into an integrated OLAP solution. The establishment of the OLAP Council in January by charter members Arbor Software, Comshare, IRI Software, and Pilot Software is expected to produce a common API to simplify access (although some vendors complain they've been excluded).

Because OLAP and OLTP (on-line transaction processing) complement each other, the ultimate solution is to let both use the same data. Because OLTP must write and access data quickly, this argues for the data to be in a relational database. The problem then becomes how multidimensional tools can best access it. Ideally, there would be some way to rapidly transfer data from relational databases to multidimensional tools.

A big boost for multidimensional tools would be "data pumps" to move data from existing relational or nonrelational databases to native multidimensional databases automatically. Such data pumps already exist to move data to relational databases from other relational databases or from nonrelational databases. However, John Faig of the Meta Group sees a need for the ability of large organizations to automatically populate multidimensional databases in an off-line batch mode.

Relational databases and multidimensional arrays are not mutually exclusive, but complementary. Many organizations will want to use both. There are several ways to configure the two systems to coexist peacefully. For example, i nformation about the relational database can be included in the multidimensional server's data dictionary. This lets the multidimensional server access the relational database with dynamic SQL statements. Alternatively, tables in the relational database can be formatted to behave like extensions to the multidimensional server's data dictionary.

Someday soon, with luck, the details of OLTP/OLAP interoperability will be worked out. Users will then be able to get on with their real job: turning data into information.


Structure of a Multidimensional Data Store

illustration_link (9 Kbytes)

ARRAY COMPOSITION
Multiway arrays are comprised of two parts: a header and a data block. The header contains the basic data dictionary, or metadata, for the array, and the data block contains the actual values that populate the arrays. It is the combination of the header and data block that allows Multiway arrays to very quickly index and access large volumes of multidimensional data.

ARRAY ACCESS AND MANIPULATION
To access data, the header is used to determine which specific cell is requested. This allows the data to then be directly accessed in a single disk I/0. Because the header contains information about the size, shape, and attributes of each array, many dictionary changes (e.g., adding and deleting positions) can be done without ever changing the data, by accessing only the header. Moreover, new arrays can be created and manipulated through the header, reducing database maintenance and overhead.

Kenan's Multiway (part of the Acumate product) splits data into header and array blocks to streamline access. (Courtesy Kenan Systems Corp.)


Multidimensional Analysis Preparation

screen_link (44 Kbytes)

To prepare for multidimensional analysis, users graphically interconnect objects representing data sources, operations, and outputs. The example here is Planning Science's Gentium.


Edmund X. DeJesus is a BYTE senior editor. He has a Ph.D. in physics and has been a professional programmer for over 15 years. You can reach him on the Internet or BIX at edejesus@bix.com .

Up to the Special Report section contentsGo to previous article: In The Pipeline: Notes 4.0Go to next article: What the Users SaySearchSend a comment on this articleSubscribe to BYTE or BYTE on CD-ROM  
Flexible C++
Matthew Wilson
My approach to software engineering is far more pragmatic than it is theoretical--and no language better exemplifies this than C++.

more...

BYTE Digest

BYTE Digest editors every month analyze and evaluate the best articles from Information Week, EE Times, Dr. Dobb's Journal, Network Computing, Sys Admin, and dozens of other CMP publications—bringing you critical news and information about wireless communication, computer security, software development, embedded systems, and more!

Find out more

BYTE.com Store

BYTE CD-ROM
NOW, on one CD-ROM, you can instantly access more than 8 years of BYTE.
 
The Best of BYTE Volume 1: Programming Languages
The Best of BYTE
Volume 1: Programming Languages
In this issue of Best of BYTE, we bring together some of the leading programming language designers and implementors...

Copyright © 2005 CMP Media LLC, Privacy Policy, Your California Privacy rights, Terms of Service
Site comments: webmaster@byte.com
SDMG Web Sites: BYTE.com, C/C++ Users Journal, Dr. Dobb's Journal, MSDN Magazine, New Architect, SD Expo, SD Magazine, Sys Admin, The Perl Journal, UnixReview.com, Windows Developer Network