When client/server applications grow to accommodate hundreds or thousands of clients, today's operating systems break down. Here's how TP monitors come to the rescue.
Jim Gray and Jeri Edwards
In a simple client/server system, many clients issue requests and one server responds. But when you scale up from 50 clients to 500 or more, this model breaks most operating systems. TP (transaction processing) monitors solve this scale-up problem by modifying the simple request-response flow using techniques that have evolved over the last 30 years. They also introduce tools for designing, configuring, managing, and operating client/server systems. We'll first focus on how TP monitors tackle the fundamental upsizing problem and then look at some of their configuration and management capabilities.
Dedicating one server process per client is the obvious way to build a client/server application. In this model, client applications run concurrently; if one malfunctions, others are not affected. But the server-process-per-client design has two severe scalability problems. First, there's the percentage problem. At 10 clients, each client gets 10 percent of the server; however, at 100 clients, each gets just 1 percent of the server. The shared server and its data become precious resources as the number of clients rises above 30 or so.
Then there's the polynomial explosion problem (see the figure "
Process per Client: XYZ Connections
"). Each client typically wants to open several applications. Each application wants to open several files. So X clients opening Y applications each with Z open files results in XY processes and XYZ connections and open files. Before long, this adds up to thousands of processes, tens of thousands of connections, and a crashed operating system.
The obvious solution to these problems was to go from a server process per client to a server process per server (see the figure "
Process per Server: X+Z Connections
"). Many companies implemented an efficient operating system within the operating system--folding the entire application within one operating-system process. This TP operating system implemented a private thread library and built a private file system based on the host operating system's raw file interface. Typical of this approach was IBM's original CICS, which could support hundreds of clients on a half-MIPS, 100-KB server running DOS/360. The idea was reincarnated in the 1980s by Novell's NetWare and by Sybase's SQL Server.
Sybase multithreaded a single Unix process and added stored procedures, thus getting a three times speedup and a 10 times scale-up advantage over the process-per-client servers from Oracle, Ingres, and Informix. Today, database vendors have copied the Sybase design and offer multithreaded servers w
ith stored procedures. Novell's NetWare file server, introduced in 1982, quickly evolved into a database and applications server. By using inexpensive threads and emphasizing the performance of the simple requests, NetWare was able to support many clients with a relatively modest server.
The process-per-server design addresses the percentage problem by offering efficient services. Novell is justly proud of NetWare's ability to service a client's disk request in under a thousand instructions, which is 10 to 100 times better than such general-purpose operating systems as OS/2, Unix, and Windows NT. The process-per-server design solves the polynomial explosion problem by having only one server process. There are only X client connections to the server; Y applications at the server collectively open only YZ files. These are manageable numbers.
Applications Server Partitions
CICS, NetWare, and Sybase have been extraordinarily successful, but there are problems with the process-
per-server design. It does not scale to shared-memory symmetric multiprocessors because the single operating-system process uses a single processor. Other processors just sit idle. If that single operating-system process faults or waits in any way, the entire server stalls. Even worse, the process-per-server design does not scale to clusters of servers.
Beyond these scalability problems, the process-per-server model has a manageability problem. The design creates a monolithic process that collapses all applications into one address space. A bug in any application can crash the server. Changing any application can impact all others.
These scaling and management problems obviously suggest the idea of a process-per-application-server partition (see the figure "
Process per Partition: X+1 Connections per Server
"). The idea is to specialize a process or processor to service a particular application function. You scale the system by adding servers for each application. If an application
saturates a single server, you partition the application data and dedicate a server to each partition.
The process-per-application-server-partition technique is widely used to scale up CICS, NetWare, Sybase, and Oracle applications. The difficulty is that it reintroduces the polynomial explosion problem. The clients must connect to each application-server partition, log on to it, and maintain a connection with it. The client code needs to route requests to the appropriate partition.
It is not easy to partition most applications. A particular request may touch many partitions. There are often central files or resources that all partitions or applications (e.g., the customer list, the price list, and the bindery) use. Partitioning such resources is not possible, so they must be replicated or managed by a shared server. Nonetheless, process-per-application partition is the most widely used scalability technique today.
All the solutions described so far involve two kinds of processes: client
s or servers. These are generically called two-ball designs. All the two-ball designs expect the client to find the servers and route requests to the appropriate server. Each server authenticates the client and manages the connection to the client.
Routers: A More Scalable Design
The three-ball model introduces a router function (see the figure "
Three-Ball Model: Routers Have X+A Connections
"). The client connects to a router, and the router brokers client requests to servers. The client is authenticated once and sends all its requests through a single connection to its router. This design scales by adding more routers as the number of clients grows.
Routers typically create and manage pools of application-server processes. All members of a process pool provide identical services. A pool can be distributed across the several nodes of a cluster; the routers balance the load. Each application can have a separate server pool. The router can run different po
ols (applications) at different priorities to optimize response time for simple requests. If a server fails, the router redirects the request to another member of the pool. This arrangement provides load-balancing and transparent server fail-over for clients.
IBM's IMS, built in 1970, was the first three-ball system. It had a single router process. With time, Tandem (Pathway, 1979), Digital Equipment (ACMS, 1981 and RTR, 1987), AT&T (Tuxedo, 1985 and Topend, 1991), and Transarc (Encina, 1993) generalized the ideas to provide many additional features.
The process-per-client model had the virtue of implementation simplicity, and each client benefitted by having its own server process. However, the design did not scale up because of the percentage problem and the polynomial explosion problem. The two-ball model collapsed all the applications together, thereby solving these two problems but creating other scalability problems.
The three-ball model multiplexes the several clients down to a few
server processes. This solves the polynomial explosion problem, but the percentage problem remains an issue. The three-ball model uses the operating system to provide server processes. The benefit is that the router and applications can use symmetric multiprocessors and clusters. They can scale up to large systems with high throughput. If the router is designed carefully and the operating system dispatches well, the three-ball model can compete with the uniprocessor two-ball systems.
The three-ball model allows the applications designer to use either a process per server CPU, a process per application, or a process per client, which is the most interesting case. By dynamically connecting the client to a server on an as-needed basis, the three-ball router increases the duty cycle on each server. This solves the polynomial explosion problem while still allowing the application to have simple interactions with the client.
Initially, classic CICS (on the mainframes) was implemented as a two-ball mo
del. When CICS was reimplemented on Unix to be portable, it was implemented as a three-ball system built above Transarc's Encina toolkit. Now it is fair to say that all the popular TP monitors, such as IMS, CICS, ACMS, Pathway, Encina, Topend, and Tuxedo, are three-ball systems.
TP Monitors and ACID Transactions
Originally, TP monitor meant teleprocessing monitor--a program that multiplexed many terminals (clients) to a single central server. Over time, TP monitors took on more than just multiplexing and routing functions, and TP came to mean transaction processing. What's a transaction? A transaction is a set of actions that obeys the four so-called ACID properties--atomic, consistent, isolated, and durable (see
"The Four ACID Properties"
).
The ISO/TP standard (ISO0026) defines how to make transactions atomic, and the X/Open Distributed Transaction Processing standard defines a system structure and API that lets servers participate in atomic transact
ions. The transaction tracking system in NetWare, the resource manager interfaces in TP systems, and the transaction mechanisms of many SQL products can be used to build ACID applications.
TP monitors go well beyond a database system's narrow view of ACID applications. A TP monitor treats each subsystem (i.e., database manager, queue manager, and message transport) as an ACID resource manager. The TP monitor coordinates transactions among them. For example, a TP system will assure that when a database gets updated, an output message is delivered and an entry is made in the work queue, either all these actions will occur (exactly once) or none will.
Beyond ACID, TP monitors configure and manage client/server interactions. They help applications designers build and test their code. They help systems administrators install, configure, and tune the system and help the operator with repetitive tasks. They also manage server pools. Finally, they connect clients to servers and provide efficient system
services to applications.
Queued, Conversational, and Work-Flow Models
Most TP monitors have migrated from a two-ball to a three-ball model in which the client performs data capture and local data processing and then sends a request to a router. The router brokers the client request to one or more server processes. Each server in turn executes the request and responds. Typically, the server manages a file system, database, or BBS shared among several clients. This design has evolved in three major directions: queued requests, conversational transactions, and work flow.
Queued TP is convenient for applications where some clients produce data while others process or consume it. E-mail, job dispatching, EDI (Electronic Data Interchange), print spooling, and batch report generation are typical examples of queued TP. TP monitors include a subsystem that manages transactional queues. The router inserts a client's request into a queue for later processing by other applications. T
he TP monitor may manage a pool of applications servers to process the queue. Conversely, the TP monitor may attach a queue to each client and inform the client when messages appear in its queue. Messaging applications are examples of queued transactions.
Simple transactions are one-message-in, one-message-out client/server interactions, much like a simple RPC (remote procedure call). Conversational transactions require the client and server to exchange several messages as a single ACID unit. These relationships are sometimes not a simple request-response but rather small requests answered by a sequence of responses (e.g., a large database selection) or a large request (e.g., sending a file to a server).
The router acts as an intermediary between the client and server for conversational transactions. Conversational transactions often invoke multiple servers and maintain client context between interactions. Menu and forms-processing systems are so common that TP systems have scripting tools to qu
ickly define menus and forms and the flows among them. The current menu state is part of the client context. Applications designers can attach server invocations and procedural logic to each menu or form. In these cases, the TP monitor (router) manages the client context and controls the conversation with a work-flow language.
Work flow is the natural combination of conversational and queued transactions. In its simplest form, a work flow is a sequence of ACID transactions following a work-flow script. For example, the script for a person-to-person E-mail message is compose-deliver-receive. Typical scripts are quite complex. Work-flow systems capture and manage individual flows. A client may advance a particular work flow by performing a next step in the script. The systems designer defines work-flow scripts as part of the application design. Administrative tools report and administer the current work-in-process.
Writing Applications in a TP System
TP systems vary enormous
ly, but the general programming style is to define a set of services that the server will provide. Each service has a message interface. To implement a system, you define client programs that generate these messages and server programs that service them.
In the two-ball model, your server program runs inside the TP monitor as, for example, a Sybase Transact SQL, Oracle PL/SQL, or CICS program. In the three-ball model, a C, C++, or COBOL program runs in a standard process using the TP system library to get and send messages.
Service programs are invoked with a message either from a queue or directly from a client. The service program executes the application logic and responds to the client with a response message.
TP systems generally provide tools to automate construction of forms-oriented clients and of simple applications servers that can serve as templates for customization. Other tools come from independent tool vendors, such as Intersolv, Magna, Texas Instruments, and Dynasty.
System Configuration, Management, and Operation
TP monitors allow you to build enormous client/server systems. Before three-ball TP monitors appeared on Unix systems, it was hard to scale up to (never mind manage) 300 clients. Today, most vendors are reporting TPC-A and TPC-C benchmarks demonstrating thousands of clients attached to a single server and tens of thousands attached to server clusters.
Now that you can build such systems, how do you manage them? TP monitors have evolved a broad suite of tools to configure and manage servers, as well as large populations of clients.
The TP monitor thinks of each database and application as a resource manager. These resource managers register with the TP monitor, which, in turn, informs them of system checkpoints and shutdowns and provides an overall operations interface to coordinate orderly system start-up and shutdown. Database systems, transactional queue managers, and remote TP systems all appear to be resource managers.
The TP monitor views each application as a collection of service programs running on one or more servers and provides tools to package these services relative to servers. Typically, short-running or high-priority services are packaged together, and batch or low-priority work is packaged in separate server processes.
After packaging the services, an administrator assigns security attributes to them. Aspects of security typically include role, workstation, and time, so, for example, clerks might be allowed to make payments from in-house workstations during business hours. The TP monitor provides a convenient way to define users and roles and to specify the security attributes of each service. It authenticates clients and checks their authority on each request, rejecting those that violate the security policy.
Each service has a desired response time. Some services are long-running reports or minibatch transactions, and others are simple requests. TP monitors manage the size and priority of
server pools so that each service meets its response-time goals. The administrator can control how many processes or threads are available to each service. Server pools can be spread over multiple nodes of a cluster, and this number can be dynamic, growing as loads do. When the number of requests for service exceeds the maximum size of the server pool, requests are queued until the next server becomes available.
Systems involving thousands of clients and hundreds of services have lots of moving parts. Change is constant, and TP monitors manage it on the fly. They can, for example, install a service by creating a server pool for it. Even more interesting, they can upgrade an existing service in place by installing the new version, creating new servers that use it, and gradually killing off old servers as they complete their tasks. Of course, the new version must use the same request-reply interface as the old one.
TP systems mask failures in a number of ways. At the most basic level, they use the
ACID transaction mechanism to define the scope of failure. If a server fails, the TP monitor backs out and restarts the transaction that was in progress. If a node fails, it migrates server pools at that node to other nodes. When the failed node restarts, the TP system's transaction log governs restart and recovery of the node's resource managers.
Innovative TP Techniques
Modern database systems can maintain multiple replicas of a database. When one replica is updated, the updates are cross-posted to the other replicas. TP monitors can complement database replication in two ways: First, they can submit transactions to multiple sites so that update transactions are applied to each replica, thus avoiding the need to cross-post database updates.
More typical, TP systems use database replicas in a fallback scheme--leaving the data replication to the underlying database system. If a primary database site fails, the router sends the transactions to the fallback replica of the d
atabase. This hides server failures from clients--giving the illusion of instant fail-over. Because the router uses ACID transactions to cover both messages and database updates, each transaction will be processed exactly once. Without this router function, clients must explicitly switch servers when a primary database server fails and probably won't preserve exactly once semantics during the switchover.
RPC and request broker technologies are the bread and butter of TP systems. The emerging crop of commercial ORBs (object request brokers), whose performance can sometimes be measured not in transactions per second but rather in seconds per transaction, would do well to embrace the three-ball model and the techniques pioneered by TP systems. Meanwhile, the venerable TP monitors, CICS, IMS, ACMS, Pathway, Tuxedo, Encina, and Topend, will continue to evolve as high-performance message routers and server-pool managers.
Transactions provide a simple model of success or failure. A
transaction either commits (i.e., all its actions happen), or it
aborts (i.e., all its actions are undone). This all-or-nothing
quality makes for a simple programming model.
The ACID properties describe the key features of transactions:
Atomic All or nothing, either all the actions
happen or none do.
Consistent The transaction as a whole is a correct
transformation of the database.
Isolated Each transaction runs as though there
are no concurrent transactions.
Durable The effects of committed transactions
survive failures.
Database and TP systems automatically provide these ACID properties.
They use locks, logs, multiversions, two-phase-commit, on-line dumps,
and other techniques to provide this simple failure model. All the
programmer need do is write consistent programs and bracket th
em with BEGIN and COMMIT. If anything goes wrong the programmer can call
ROLLBACK--much like the quit function in a text editor. This
simplicity is especially important in client/server computing and
distributed databases, where the transaction may have done work at
many nodes. COMMIT makes all the changes at all the nodes durable;
ROLLBACK undoes all the changes.
illustration_link (30 Kbytes)
Each client can use many server processes, and each server process can use many files. The resulting polynomial explosion can rapidly overwhelm an operating system.
illustration_link (23 Kbytes)
Using threads, a single-server process drastically reduce
s the number of connections and open files required to support a given number of clients.
illustration_link (18 Kbytes)
With partitioned data, the application can scale across multiple servers. But it's tricky to partition the data, and clients have to maintain connections to multiple servers.
illustration_link (42 Kbytes)
Routers manage pools of application-server processes and broker client requests to servers. This design scales by adding more routers.
Jim Gray is a specialist in database and transaction-processing systems. He has worked for IBM, Tandem, and Digital on projects including System R, SQL/DS,
DB2, IMS-Fast Path, Encompass, NonStopSQL, Pathway, TMF, Rdb, DBI, and ACMS. He is editor of The Benchmark Handbook for Database and Transaction Processing Systems (Morgan Kaufmann, 1993) and coauthor of Transaction Processing Concepts and Techniques (Morgan Kaufmann, 1992). Jeri Edwards is director of transaction processing and client/server software development at Tandem Computers and one of the authors of Essential Client/Server Survival Guide (Van Nostrand Reinhold, 1994). You can reach them on the Internet or BIX at
Gray@crl.com
or
Edwards_Jeri@Tandem.com
, respectively.