, as I describe it, directly on relational data or whether or not you're required to have a multidimensional database. A lot of people have said, "Well, who cares about that, we care more about the results, and whether it's a multidimensional database or going directly against a relational database is secondary." And that's basically true, that should be secondary.
But it's not irrelevant, because there are a lot of issues in loading, managing, and dealing with data and all things being even close to equal, why not keep the data in the relational database, which everybody knows how to deal with, manage, operate, maintain, and enhance? So if you could do analysis directly against a relatio
nal database without having to permanently manage and store data in another database, that is all to the good. And in fact, our view is you can do much of what you need to do directly against relational databases, which is generally called ROLAP. Now that may involve a three-tier architecture, or it may not involve a three-tier architecture. Some people do it by literally generating SQL queries from a client, feed it to a relational database and get back the answer, displaying the results in a multidimensional format. Other people have a middle tier, a server if you will, that generates the queries and caches the data so that the data doesn't go directly to the client. It passes through the cache, which lets you deal with much larger amounts of data, do computations on the server, and get other advantages in a server centric architecture. With the three-tier approach, the server can process more data than can be downloaded to a typical PC. Our view is with that three-tier server centric architecture, you can
in fact do much of what you need to do against a relational database. There are, however, a few places where you do fall into trouble.
BYTE:
We can talk about the trouble spots later, but first, could you explain how your product Oracle Express fits into your multidimensional analysis strategy?
Stamen:
Oracle Express has the three-tier architecture and it can generate relational queries on the fly, and in that middle server tier can formulate multidimensional arrays for the purpose of serving them up to the client. So in that sense, it behaves purely like a three-tier ROLAP system. But we also give the option, if you want, of storing those multidimensional arrays in the Express Engine itself so you don't have to go get the data from the relational database every time you want to work on it. And when you store it in those multidimensional arrays, whether the data comes from multidimensional arrays and is stored there permanently or whether the data
gets generated on the fly through SQL access to the relational database, is totally transparent to the user. It's at a low level -- the server takes care of it automatically -- and you get to pick and choose what data goes where. So our view is we give you the best of both worlds of ROLAP and MOLAP, and you can mix and match them. So even within the same analysis, some of the data could be in the relational database and some in the multidimensional.
BYTE:
When would you want your data in one versus the other?
Stamen:
If you're dealing with lots of data -- and I'm talking about tens or hundreds of gigabytes of data -- it's really not very realistic to have to take that data and keep updating multidimensional databases with it. That's a pretty expensive and time consuming proposition. So what you'd like to do is go up against that data directly. On the other hand, if I'm doing things like complex multidimensional models, or what-ifs, what-if meanin
g simulations, projections, forecasts and so on, then I'm dealing with very complicated calculations that cannot be expressed in SQL. Then it's not realistic to do most of my work against a relational database. You need to do it in some kind of a multidimensional cache. So our view is it's the old 80/20 rule. For 80 percent of the data, you never have to leave the relational database, just get it when you need it. For the 20 percent of the data that's generally higher level aggregations of data, that you're using to do a lot of your modeling and what if and so on, rather than keep going back to the relational data and generating that data on the fly each time, stage it in a multidimensional cache.
BYTE:
So what is the advantage of going with your solution, which could be a multidimensional database, versus some other company's multidimensional database that would work with SQL databases, including Oracle.
Stamen:
You mean, some competitors like Ar
bor? Arbor does not let you do analysis on relational data. It only lets you do analysis on the data in its multidimensional database. It lets you drill down, and see the detailed data in a relational database. But you can't analyze it without reading it into and creating an Arbor database. We let you, on-the-fly, actually analyze the data without having to read it in and permanently store it in an Express database.
BYTE:
What would be some things on the fly that would be appropriate for that kind of operation?
Stamen:
Almost any kind of data navigation, data drill down, share calculations, percent change, variance, trend reporting, graphing; these are common multidimensional activities. So when you're talking about trend reporting, you're talking about `what were the sales for the last six months', and forecasting it out for the next six months. Or comparative trend reporting: `how is product
x
trending relative to my other three best selling pro
ducts. Is the growth faster, slower, is it following the same seasonality patterns'?
BYTE:
What kinds of operations would become troublesome, that is, they are not appropriate for direct reporting off the SQL database and require you to use the multidimensional cache?
Stamen:
Trying to run a complicated financial model with four or five dimensions of data, lots of equations, lots of hierarchies, simultaneity, how interest expense depends on how much money I have borrowed, how much money I have borrowed depends on what my shortfall is in cash, and what the shortfall is in cash depends on how much money I pay in interest. See the loop? So when you're doing that, you generally want the data stored in multidimensional arrays and operated on multidimensional arrays. Our view is there are a lot of people debating doing
A
versus doing
B
and we're saying you really need to do both. Others say you need to do
A
because they really
only do
A
, or you need
B
because they only do
B
. That's not the user's point of view, that's the vendor's point of view.
BYTE:
One of the touted benefits of the so-called ROLAP solution is the integration developers get from using tools from one company. What are the areas of integration that you can leverage at Oracle that will benefit your customers?
Stamen:
We have access to a little bit lower level APIs that will be a little faster which can be used inside Oracle. And we work with the developers, so we know the most optimal techniques. We're also working, for example, to develop common metadata
[
ed. note -- metadata refers to data that describes other data
]
that would be used for all Oracle applications. Looking at a tighter level of integration, we've got what I call
short term deliverables
and
long term deliverables
. Short term, we're going to be using the Oracle system
management tools and the Oracle network tools. For example, we will use SQL*Net as our transport protocol. The benefit is that if you're an Oracle shop, we're using the same protocols that all your other applications are using to go against the Oracle databases. It makes it easy for system management, system security, user identification, and so on. That's in communicating between the client and the server. If you're an Oracle shop, everybody is just on SQL*Net whether you're going against Express or going against Oracle or going against both. Also in the short term, in the next six to nine months, in the same way you can manage your Oracle database engine and start it up and bring it down and so on, you can do the same with your Express server.
Longer term, we're going to be offering one engine that integrates multidimensional data with relational data in the same engine. So you could think if it very much like a data blade for multidimensional data in the Oracle engine itself. And now instead of havi
ng just SQL APIs to it, you'll have both a SQL and a multidimensional API. And the multidimensional API will be the same Express API that we have today, meaning everything you run today will still run, except when it actually accesses data, it will get it via the Oracle engine. So the difference is at a very low level. Remember when I said before that when we want to get at data, we kind of say do we have the multidimensional data in the Express Storage manager or do we need to generate SQL to get it, we'll just go right to the Oracle database manager and get it in both cases.
BYTE:
What about performance, where customers may worry about performance on mission critical data?
Stamen:
Well, if you look at databases today, most people have a separate database for their warehouse data. You very rarely will do any kind of serious analysis directly against the OLTP transactional database. You'll almost always have a data warehouse database. Performance
will be terrific because we're actually going to have a multidimensional access method right in the Oracle database. You'll be able to get at all the data via SQL and look at it as if it's relational, or you can get it all the data via multidimensional techniques and look at it as if it's multidimensional, but it's the same data. The benefits of this approach are you have one data manager and one set of DBA utilities and one data dictionary. There are some performance benefits but it's mainly a support benefit, there's much less support overhead.
BYTE:
When you say longterm, how far out is that?
Stamen:
The timeframe is on the order of two years.
BYTE:
What about in the future, such as the possibility of building aggregates ahead of time for a user, based on the types of analyses the user has performed in the past?
Stamen:
The Oracle database is actually getting some of those capabili
ties in some warehouse products we're building. We'll work right on top of that, so we've got that intelligence already built into to our ROLAP capability. We always look for the highest level of aggregation of the data. So when this warehouse product hits -- and it should be hitting probably end of year, in that timeframe -- it will actually monitor different levels of aggregation, and update the metadata as different levels of aggregation are formed. And when we do our relational reach through, we look for the highest level of aggregation we can find to do it with because therefore it reduces processing.
By highest level of aggregation, I mean, let's say I have some data by zip code, some data by metro region, some data by state, some by country. And in a relational database, they are probably in different tables: here's my zip code data, here's my metro data, here's my state data and so on. When a user gives a query and they want to see something like the trendline by geography, if they limit their ge
ographies to metro regions, we'll check if there is any stored by metro region. If so we use it; if not, we'll go to the zip codes and aggregate it on the fly to the metro region level. We won't restrict what you can do. If you don't already have it aggregated by metro region, we'll aggregate it on the fly for you.
BYTE:
What kinds of things should people be thinking about as they plan out their multidimensional database strategy? What are the gotchas?
Stamen:
What is the application, who are the users, and what do they want to do? People often forget that kind of thing. Very often people come from the bottom up instead of the top down. So instead of saying, "Oh here's the data we have and we ought to put it in a warehouse, and how do we manage the warehouse," etc., I think we ought to start with "Who are my users and what do they need to do and how do I serve them?" And then I think you'll find something very quickly. You'll find that a lot of u
sers need analysis, but don't necessarily themselves do a lot of analysis. What I mean by that is part of their application is doing a lot of calculations, the trends and projections, the what-ifs and simulations, and so on. They wouldn't be able to define those on their own, because they're not analysts. The applications, in effect, need expert system capability to do the analysis on their behalf.
Let's take a simple example. If you were selling me a lease for an automobile, you're doing a lot of what-ifs. What if I put this money down and give this as a trade-in? What if I want to keep it for four years instead of three years? It's a simple example, but you don't expect a sales guy to need to know how to do all the mathematical calculations. You'd expect them to have an application in which they point and click, fill in the blanks, and the application does that analysis for them. The largest class of users of warehouses are in fact what we call analysis consumers as opposed to analysts. So first thing
I'd do is really understand what are the applications you have, understand who the analysis consumers are, what data you need to serve their interests and what kind of analyses are you going to have to do behind the scenes in these applications in order to let them do their jobs in the best way. I think that will generate a lot of the requirements.
BYTE:
When you look at the market and your customers out there, one of the huge benefits of databases -- especially for service oriented companies -- is the ability to use the data properly as a competitive advantage, to get much closer to a customer and start anticipating their needs. How much do you feel your users are using this kind of multidimensional analysis today versus what they could potentially be doing?
Stamen:
We do have many customers who are doing exactly that. They are literally helping to define the new product that they are introducing to the market, how to best introduce that, how to
promote that, how to price it, how to distribute it, how to anticipate market trends. All of that is being done today. But in terms of the overall potential, current use is still a very small percentage. Some industries are a little more mature than others. The consumer products industry is very much using this technology now, and now financial services, banking and insurance, and durable goods manufacturers are getting very interested. This is just now happening, in terms of what I really think is an explosion. It's one of those things where it can go on for some number of years where people are trying to experiment, you've got the pioneers, and all of a sudden a critical mass develops and then everyone starts doing it and it explodes. And I think we're right at the cusp of that explosion. It's started. Probably about a year ago, it started. We're on that dramatic growth curve right now. You see that in all the interest in data warehousing. It's how to use your data to run your business better. That's the
whole gist of this: fact-based decision making. Data warehousing is just approaching it from the database on up. And I'm saying you need to do that, but you also need to approach it from the business problems on down. And you should do both simultaneously to do it right.
Note: For More information on Oracle's OLAP Tools, see http://www.oracle.com
.