Archives
 
 
 
  Special
 
 
 
  About Us
 
 
 

Newsletter
Free E-mail Newsletter from BYTE.com

 
    
           
Visit the home page Browse the four-year online archive Download platform-neutral CPU/FPU benchmarks Find information for advertisers, authors, vendors, subscribers Request free information on products written about or advertised in BYTE Submit a press release, or scan recent announcements Talk with BYTE's staff and readers about products and technologies

ArticlesHandling Sparse Data


December 1995 / Reviews / More RAM for Win 3 Holdouts / Handling Sparse Data

Suppose you have sales data for the past 1000 time periods, for 100 salespeople selling 100 products in 100 geographical areas. You're talking about a four-dimensional database with 1 billion cells (i.e., possible intersections of dimensions).

Of course, the Topeka office just opened last summer, young Farnsworth has been selling for only a few months, and they don't handle ski boots at the Fort Lauderdale office. Such considerations make the original billion possible cells dwindle to a mere 100 million actual cells.

This is an example of sparse data : The actual number of data values is but a fraction of the theoretically possible number. And the amount of wasted space grows exponentially with the num ber of directions.

Handling sparse data is a major problem of multidimensional databases, multidimensional analysis, and on-line analytical processing (OLAP). Different vendors have different ways of dealing with sparsity. Most, including Arbor Software, use some form of data compression.

Most also use some system of metadata that keeps track of where the populated cells are in some efficient way. But you don't want to waste more time interpreting the metadata and retrieving data than you would by simply wading through those empty cells.

Essbase first divides its vast theoretical hypercube into dense dimensions and sparse dimensions. Dense dimensions have values in most cells. Sparse dimensions don't have values in most cells (as a rule of thumb, less than 20 percent of sparse-dimension cells have values). The trick is to avoid wasting space on the sparse dimensions and to allow rapid access to the data in the dense dimensions.

Arbor ha s a patented storage method to maximize performance in both dense and sparse dimensions. Combinations of dense dimensions form data blocks, packed tight with values. Essbase also creates data blocks for combinations of sparse dimensions -- if there's anything in them. It doesn't allocate data blocks for empty combinations of sparse dimensions.

In effect, the data blocks thus gerrymander the mostly empty hypercube into mostly full subhypercubes. Some of these can be quite large. For instance, the data block of your earliest 10 products sold by your first 10 salespeople in the original 10 cities produces a very dense subhypercube of 1 million values.

There's a certain amount of overhead involved in describing the details about each of these dense data blocks. But this is dwarfed by all the cells that are not represented at all. Since Essbase keeps its entire data-block index in memory, access is very fast.


Theoretical Data Hypercube

illustration_link (15 Kbytes)


Up to the Reviews section contentsGo to previous article: Data to the Nth DimensionGo to next article: The Better Virtual PCSearchSend a comment on this articleSubscribe to BYTE or BYTE on CD-ROM  
Flexible C++
Matthew Wilson
My approach to software engineering is far more pragmatic than it is theoretical--and no language better exemplifies this than C++.

more...

BYTE Digest

BYTE Digest editors every month analyze and evaluate the best articles from Information Week, EE Times, Dr. Dobb's Journal, Network Computing, Sys Admin, and dozens of other CMP publications—bringing you critical news and information about wireless communication, computer security, software development, embedded systems, and more!

Find out more

BYTE.com Store

BYTE CD-ROM
NOW, on one CD-ROM, you can instantly access more than 8 years of BYTE.
 
The Best of BYTE Volume 1: Programming Languages
The Best of BYTE
Volume 1: Programming Languages
In this issue of Best of BYTE, we bring together some of the leading programming language designers and implementors...

Copyright © 2005 CMP Media LLC, Privacy Policy, Your California Privacy rights, Terms of Service
Site comments: webmaster@byte.com
SDMG Web Sites: BYTE.com, C/C++ Users Journal, Dr. Dobb's Journal, MSDN Magazine, New Architect, SD Expo, SD Magazine, Sys Admin, The Perl Journal, UnixReview.com, Windows Developer Network