into real-time enterprises, where they quickly gather and immediately respond to changes in information.
A real-time enterprise requires new applications and application services designed to immediately process changes in information. These services are termed business event-driven, where a business event is any significant change in business data or conditions. Such services require new technologies and capabilities, such as event communications services (ECSes) that can provide asynchronous, near-real-time communication of business events to all interested parties. ECSes foster the development of event-enabled applications and databases, real-time decision support services, and business automation.
In this article, I will describe the ECS. It forms the basic infrastructure that other event services rely on. A subsequent article will discuss the other services.
Event Communications
The world is becoming event-enabled. For example,
database systems support triggers, mechanisms that report changes to database records. Most packaged applications now offer external interfaces that communicate changes to business data. Common Object Request Broker Architecture (CORBA) and Distributed Component Object Model (DCOM) explicitly support business events.
Once generated, the ECS manages the events. It uses a publish and subscribe (P&S) mechanism, where an application publishes events and consumers subscribe to events of interest. The ECS ensures that each published event reaches each subscriber, asynchronously and with minimal delay.
While ECSes have existed for several years, a new generation, such as Velociti, offers better performance and scalability. Based on Internet and distributed object standards (e.g., CORBA and IIOP), they can implement near-real-time event transmission throughout a large enterprise. These new ECSes use three mechanisms to achieve these capabilities. First, they use scalable and reliable multicast protocols
built on the Internet-standard IP-multicast protocol. Second, they use a federated architecture -- similar to the Web's -- where many servers are deployed and the load is seamlessly partitioned among them. Finally, caching functions help balance the network load.
Scalable, Reliable Multicast
Multicast protocols are an efficient method for disseminating real-time information. In contrast to unicast protocols such as TCP that send a message to every event consumer, multicast protocols send a message once, regardless of the number of consumers. For example, using a multicast protocol, an event publisher performs one send to transmit a message to 1000 subscribers. A unicast protocol must send 1000 messages.
An IP-based standard for multicast was defined in the late 1980s (RFC1112) and today is supported in hardware in most commercial routers. By itself, however, this standard can't tackle business-critical event communication because it is not "reliable": It does not autom
atically detect and retransmit lost messages. Business-critical event communications require a more reliable protocol layered on top of IP-multicast.
Achieving reliability in a scalable manner is a difficult challenge. Consider the "obvious" approach of sending acknowledgment (ACK) responses for each successfully received message. For 1000 subscribers, this results in 1000 ACKs for each multicast message. The resulting traffic would flood the network, creating what is known as the ACK implosion.
A better approach returns responses only when errors occur. These negative acknowledgments (NACKs) leverage on the reliability of today's networks, but it can still result in an implosion problem if large numbers of recipients happen to lose the same message. Scalable, reliable multicast protocols must therefore solve the ACK/NACK implosion problem.
In the early 1990s, researchers JoMei Chang (cofounder and CEO of Vitria Technology) and Nick Maxemchuk invented the first reliable multicast protocol to
solve the ACK/NACK implosion problem. This pioneered the field of reliable multicast for the next 15 years. Velociti, a product of Vitria, implements a reliable multicast protocol (VRMP) that achieves scalability and reliability through four key mechanisms.
First, participants cache received messages for some small period of time. The interval varies among participants. Second, participants dynamically estimate the distance, measured as communications latency, to other participants through the use of periodic status messages.
Third, on detecting a lost message -- an observed gap in the packet sequence numbers -- participants use a NACK-based protocol with random delays and exponential back-off. Instead of immediately sending a NACK, the recipient chooses a random delay proportional to the sender's distance and then waits to multicast the NACK. If during the wait, the recipient hears the same NACK from another recipient, it doubles the wait time. If the recipient receives the m
issing message during the wait, it cancels the NACK request.
Finally, VRMP uses what's called distance-based repair. On receiving a NACK, a participant checks whether its cache contains the message. If it does, it waits for a random interval proportional to the recipient's estimated distance. If during the wait the participant hears a "retransmission" response for the same NACK, it does nothing. Otherwise, it remulticasts the message at the end of the delay, as shown in the figure
"Reliable Multicast Protocol in Action."
Why does VRMP work so well? It uses standard IP-multicast, requiring no special assistance from network routers or special network services. In practice, VRMP adjusts quickly and automatically to network topology changes. It uses statistical protocols with exponential back-off, similar to robust Ethernet CSMA/CD protocols, to solve the NACK implosion problem. Finally, network errors can cause duplicate NACK responses or slow the resending of lost messages, bu
t they don't impair the protocol's reliability.
Federation, Caching, and Replicas
Multicast protocols provide scalability in terms of the number of information consumers. A federated architecture provides scalability through a number of independent information streams, known as channels. Velociti implements a Web-like notion of federated architecture, which lets the channels be located on any available Velociti server. This lets the communications load be partitioned across any number of servers and expanded as needed. Web-like naming ensures that all consumers, subject to security authorization, can access channels globally on any server. The result is unlimited scalability with no architectural constraints.
Each additional Velociti server can support newly created channels or caches or replicas of existing channels. (A replica is a persistent form of cache and provides a fault-tolerant "store-and-forward" capability.) Caches and replicas allow optimized message flows across a wide varie
ty of WAN topologies, as shown in the figure
"Multilevel Caching Over a WAN."
Replicas can support the recovery requirements of intermittent consumers or long-duration consumer failures.
Toward a Real-Time Enterprise
A new set of ECSes is taking event communications to new levels of performance and scalability. Federated architectures let information channels be partitioned across many servers with no architectural constraints. Reliable multicast protocols such as VRMP let each information channel scale to an unlimited number of subscribers. Because you can flexibly configure caches and replicas to use unicast or multicast protocols, event communications can be tuned to varied heterogeneous network configurations and diverse subscriber environments. The resulting rich flow of events, with minimum latency, lets today's enterprise make decisions in real time.
illustration_link (24 Kbytes)

Periodic status messages let every participant assess the latency or distance to each other.
illustration_link (23 Kbytes)

Caches and replicas help tune event communications to diverse subscriber networks.
Dr. Dale Skeen (
skeen@vitria.com
) i
s CTO and cofounder of Vitria Technology, Inc.