t of setting up a data warehouse keep you at the yearning stage.
Data warehousing is hot today because businesses realize that they can use information as a competitive weapon. The idea is to consolidate and aggregate data from production systems into centralized or distributed data warehouses or data marts, where users can get at it. Users can then use the data they obtain to, say, provide better customer service, do better analysis in a more timely fashion, or look at data in new ways that let them spot otherwise unknown problems or opportunities.
Two other compelling incentives for creating data warehouses and data marts are the ability to do data mining and then offering the resultant warehoused data over the Internet. The good news is that a data mart can meet the needs from either end: local access to pertinent data, for a cost and an amount
of effort that won't kill you.
Warehouses vs. Marts
Although word is spreading that data marts are definitely the way to go -- they're less expensive and easier to build than enterprise data warehouses -- the terminology itself can be an emotional issue (see the table
"Data Mart or Data Warehouse?"
). When Bill Inmon, cofounder of Prism Solutions and now head of his own data-warehousing company, Pine Cone Systems, first defined a data warehouse in 1990 as a "subject-oriented, integrated, time-variant, and nonvolatile collection of data supporting management's decisions," his vision of data warehouses was on an enterprise scale.
Today's more process-oriented definitions focus on warehousing rather than on warehouses or marts.
Warehousing
refers to a set of processes or an architecture that merges related data from many operational systems to provide an integrated view of data that can span multiple business divisions.
Data marts, on the other hand, tend to
be subject- or department-oriented. They can be subsets of a larger warehouse (in which case they are called dependent data marts), but that's not a requirement, and many independent, "stovepipe" data marts exist. You can't even define data marts in terms of size -- they aren't necessarily smaller than corporate warehouses. It's entirely possible, for example, for business-unit analysis to require so much historical depth that it's larger than the summary-level data available in the corporate warehouse.
The data-warehouse-versus-data-mart debate is one of those issues for which there's no simple answer. Some organizations insist that data marts are subsets of data warehouses and that you can't (and shouldn't) have any data marts until you pay your dues and create an enterprise data warehouse (and data model). Others say that, in today's fiercely competitive environment, you'd be crazy to embark on a multiyear warehousing project as a prerequisite to deploying any data marts. They believe it makes more
sense to simply align a data-mart pilot project with an organization's core competency -- the one that accounts for 90 percent of the firm's profits -- and go for the more immediate results.
And given the relatively modest price tags of data marts, doesn't it make sense to view your first data mart as a throwaway? Fred Brooks, in his classic
Mythical Man Month
, says that "where a new system concept or new technology is used, one has to build a system to throw away, for even the best planning is not so omniscient as to get it right the first time." In other words, you should actually
plan
to throw one away -- chances are you will anyhow.
Thus, a data mart is similar to a data warehouse that contains all of an organization's data, but it's more limited in scope. It typically focuses on the needs of a specific business unit or function and is less expensive and faster to implement than an enterprise-wide data warehouse. On the other hand, although data marts are less expensive, easier t
o start (especially given all the 30- to 90-day quick-start bundles), and often offer better performance than a gigantic data warehouse, they're often harder to scale up.
The Three M's: Models, Methodologies, & Metadata
Still, there's a powerful incentive to get things right the first time. Where
do you start?
For simple data marts with few data sources, you might want to consider your primary relational database vendor, especially if the source data resides in an IBM, Informix, Oracle, or Sybase database already. Each of those vendors offers data-mart bundles for less than $100,000, scaling from Windows NT up to massively parallel processing (MPP) systems (see the table
"Representative Vendors with Data-Mart Packages"
).
However, the majority of data-warehouse and data-mart
projects start
with meetings that attempt to define scope, products, platforms, time lines, and budgets all at once. Sometimes it helps to have a me
thodology to follow. Most vendors offer some sort of methodology, perhaps in the guise of industry-specific templates or expertise, or their own best practices, but a few generic methodologies have also emerged.
Some methodologies, like Earl Hadden's, have even been licensed by data-mart vendors. Hadden's Data Warehouse Framework, based on John Zachman's framework for systems architecture, consists of a matrix with rows for owners, architects, designers, and builders, and columns for data, process, location, organization, event, and business driver. The Hadden Data Warehouse Method consists of three basic stages: architect, implement, and operate/enhance. Another well-known methodology is Prism's (
http://www.prismsolutions.com
) Iterations.