Suppose you have sales data for the past 1000 time periods, for 100 salespeople selling 100 products in 100 geographical areas. You're talking about a four-dimensional database with 1 billion cells (i.e., possible intersections of dimensions).
Of course, the Topeka office just opened last summer, young Farnsworth has been selling for only a few months, and they don't handle ski boots at the Fort Lauderdale office. Such considerations make the original billion possible cells dwindle to a mere 100 million actual cells.
This is an example of
sparse data
: The actual number of data values is but a fraction of the theoretically possible number. And the amount of wasted space grows exponentially with the num
ber of directions.
Handling sparse data is a major problem of multidimensional databases, multidimensional analysis, and on-line analytical processing (OLAP). Different vendors have different ways of dealing with sparsity. Most, including Arbor Software, use some form of data compression.
Most also use some system of
metadata
that keeps track of where the populated cells are in some efficient way. But you don't want to waste more time interpreting the metadata and retrieving data than you would by simply wading through those empty cells.
Essbase first divides its vast
theoretical hypercube
into dense dimensions and sparse dimensions.
Dense dimensions
have values in most cells.
Sparse dimensions
don't have values in most cells (as a rule of thumb, less than 20 percent of sparse-dimension cells have values). The trick is to avoid wasting space on the sparse dimensions and to allow rapid access to the data in the dense dimensions.
Arbor ha
s a patented storage method to maximize performance in both dense and sparse dimensions. Combinations of dense dimensions form data blocks, packed tight with values. Essbase also creates data blocks for combinations of sparse dimensions -- if there's anything in them. It doesn't allocate data blocks for empty combinations of sparse dimensions.
In effect, the data blocks thus gerrymander the mostly empty hypercube into mostly full subhypercubes. Some of these can be quite large. For instance, the data block of your earliest 10 products sold by your first 10 salespeople in the original 10 cities produces a very dense subhypercube of 1 million values.
There's a certain amount of overhead involved in describing the details about each of these dense data blocks. But this is dwarfed by all the cells that are not represented at all. Since Essbase keeps its entire data-block index in memory, access is very fast.
illustration_link (15 Kbytes)
