Archives
 
 
 
  Special
 
 
 
  About Us
 
 
 

Newsletter
Free E-mail Newsletter from BYTE.com

 
    
           
Visit the home page Browse the four-year online archive Download platform-neutral CPU/FPU benchmarks Find information for advertisers, authors, vendors, subscribers

ArticlesReal-Time Queries in the Enterprise


February 1998 / Core Technologies / Real-Time Queries in the Enterprise

New forms of persistent queries are necessary to handle live business data as it speeds through a company.

Dale Skeen

Increasing the speed and accuracy of business execution translates into a significant competitive advantage. Corporations are thus transforming themselves into real-time enterprises, where they can immediately communicate, analyze, and act on business information as it occurs.

Last month, I explored event communications services (ECSes) and how they enable the scalable, real-time communication of business events. Together, the ECS and a new type of real-time/decision-suppo rt service (RT/DSS) form a Web-like infrastructure of real-time information. This infrastructure allows for rapid decisi on making, and it enables new business-automation services that respond automatically to specific business events. In this article, I explore these RT/DSSes.

Decision-Support Services

The advent of relational databases enabled general-purpose DSSes. With these tools, you could simply ask decision-support queries instead of having to program them. Similarly, the advent of event-driven technology enables new decision-support tools that can consume and analyze events almost instantaneously. With these tools you can ask powerful, real-time queries, using a high-level query language, instead of having to program them.

General-purpose tools providing real-time query processing, referred to as RT/DSSes, are becoming commercially available, such as Vitria's Martini product. Now, the manager of a large shipping hub for an express package-shippin g company can request "to monitor shipment volume and changes thereto as new packages are picked up and existing packages rerouted." Unlike database queries, real-time queries typically are long-lived -- a query may live from hours to months, depending on the information being monitored. Also, an RT/DSS must support concurrent query processing, because it's common to have thousands of real-time queries active all at once.

RT/DSS query processing is fundamentally different from traditional query processing. Rather than optimizing the bulk evaluation of a single on-line query across a large number of records, an RT/DSS optimizes the incremental evaluation of a single event (i.e., a single data change, say, a transaction) against a large number of continuous queries. Hence, an RT/DSS has to solve two hard query-optimization problems not found in traditional query processing.

The first problem is incremental query optimization, consisting of algorithms for optimizing the ongoing evaluation of a single query (i.e., track the sales of red shirts sold in Copenhagen). The second problem is multiquery optimization, consisting of algorithms for optimizing the simultaneous, incremental evaluation of multiple queries (i.e., separately track, for each market in Europe, the sales of all red shirts).

To illustrate just how an RT/DSS works, consider a real-time query. Using the package-shipping example, you monitor package information that changes on a real-time basis, particularly as shipments are delayed or rerouted. At every destination city, it is important to monitor package volume and weight to properly allocate delivery equipment. Consider the real-time query: "Monitor the total weight, per destination city, of all large, priority packages."

This is expressed in SQL as:

select city.name   sum(package.weight)
from package /*real-time*/,    city/*stored*/
where package.weight>100
  and package.service =             'priority'
  and package.zip =             city.zip
  group by city.na
me

Although it's simply expressed, this query is subtly complex. It joins real-time, dynamically changing shipping-information events (the packages en route) with stored information (cities) that resides in a traditional database system. It contains a number of query constraints (priority service and weight over 100 pounds). It requires the grouping of this information by city and also the incremental computation of weight.

Real-time query optimization consists of three steps, as shown in the figure "Anatomy of an RT/DSS." The first step is to build a discrimination network that evaluates the query constraints. When a real-time event is received, the RT/DSS evaluates the event information against the discrimination network to efficiently identify all real-time queries whose constraints match the new information.

Note again that a potentially large number of constraints from a large number of concurrent queries might be tested. Hence, the discrimination network mus t be optimized so that constraints are tested in such a way that quickly identifies matching queries and discards those that don't match.

Step two is to derive incremental algorithms for computing the final query result. Once a query's constraints have been matched against the incoming event, the query's result needs to be incrementally evaluated. Continuing with the shipping example, if a package is rerouted to a new destination city, its weight must be subtracted from the old destination city's total and added to the total for the new destination city.

Step three is to prefetch and precompute, whenever possible, query expressions involving stored data and to cache the results in memory. This is done because directly accessing a database on each receipt of a real-time event is expensive, and, at high data rates, a database system simply can't keep up. Prefetching and precomputation let the RT/DSS overcome this bottleneck, speeding up both constraint matching and incremental result computations.

Scaling

An RT/DSS service must be scalable in two dimensions: the ability to increase the number of concurrent queries, and the ability to share the results of processing a query to an increasing number of consuming applications and users. An RT/DSS uses the underlying ECS to both receive the events that drive real-time query processing and deliver the results of those queries to interested consumers.

For the package-shipping example, the RT/DSS processes a large number of queries (evaluating the number of heavy packages being shipped from various cities) and distributes the results (to all destination cities). Hence, as an RT/DSS computes the query results, it simply publishes the results on the underlying ECS.

A beneficial side effect of using an ECS is that this enables real-time queries to be composed . That is, the results of one query can be the input to another query, as shown in the figure "Building Complex Queries." Such daisy chaining of result s permits the building of complex query results incrementally by leveraging the results of simpler queries.

To scale the number of current real-time queries, an RT/DSS uses a federated architecture. If more processing power is required, you simply deploy new RT/DSS servers and rebalance the concurrent queries among the new servers.

Real-Time Reaction

One of the most practical results of an RT/DSS is that it enables business automation. For package shipping, an application with the proper event hooks could immediately respond to too many heavy packages delayed at Chicago by rerouting an aircraft there. Also, if the problem persists, the program could notify a supervisor to investigate, so that the company could change certain shipping routes.

In either case, responses to problems are immediate, rather than taking days or weeks for the problem to be discovered, much less resolved. By using Internet-based, distributed object standards, these new RT/DSS technologies can change the speed of business computing.


Anatomy of an RT/DSS

illustration_link (28 Kbytes)

Real-time query optimization uses three mechanisms (green) to expedite processing.


Building Complex Queries

illustration_link (20 Kbytes)

Because an ongoing query's results can be sent to other servers, complex queries can be built out of them.


Dr. Dale Skeen ( skeen@vitria.com ) is CTO and cofounder of Vitria Technology, Inc.

Up to the Core Technologies section contentsGo to next article: Glasgow Enhances JavaBeans
Flexible C++
Matthew Wilson
My approach to software engineering is far more pragmatic than it is theoretical--and no language better exemplifies this than C++.

more...

BYTE Digest

BYTE Digest editors every month analyze and evaluate the best articles from Information Week, EE Times, Dr. Dobb's Journal, Network Computing, Sys Admin, and dozens of other CMP publications—bringing you critical news and information about wireless communication, computer security, software development, embedded systems, and more!

Find out more

BYTE.com Store

BYTE CD-ROM
NOW, on one CD-ROM, you can instantly access more than 8 years of BYTE.
 
The Best of BYTE Volume 1: Programming Languages
The Best of BYTE
Volume 1: Programming Languages
In this issue of Best of BYTE, we bring together some of the leading programming language designers and implementors...

Copyright © 2005 CMP Media LLC, Privacy Policy, Your California Privacy rights, Terms of Service
Site comments: webmaster@byte.com
SDMG Web Sites: BYTE.com, C/C++ Users Journal, Dr. Dobb's Journal, MSDN Magazine, New Architect, SD Expo, SD Magazine, Sys Admin, The Perl Journal, UnixReview.com, Windows Developer Network