The threading model you choose when writing a WinSock service has a direct impact on how the service performs on a particular network. Here, in ascending order of complexity, are five common threading models designed to handle service networks of varying sizes.
Single thread, single client at a time is the simplest of all threading models. A service has a single loop in which it accepts (via the accept() API) an incoming client and then services it immediately in that thread. New clients must wait until the current client is serviced.
This model is easy to implement and has low resource requirements; it uses only a single thread and no more than two socket handles at a time. However, it cannot support more than one client at a time, which makes it inappropriate for all but the most basic services.
With the single thread, multiple clients with sele
ct() model, the service still uses only one thread, but it can handle multiple clients simultaneously by multiplexing among them with the Windows Sockets select() API. A single loop calls select() repeatedly to poll the listening socket and all the connected sockets. When select() indicates that one of the sockets is ready, the service determines which socket that is. If it's the listening socket, the service calls accept() to take the new connection. If it's a connected socket, the service calls send() or recv(), as appropriate, to send data to, or receive data from, the client.
This model creates a powerful service, but performance can suffer because every network I/O call passes through the select() API. This is acceptable when CPU use is not an issue but presents a problem if the service requires high performance.
One thread per client is probably the most commonly used model because it is reasonably simple to implement and is the fastest model for installations that have fewer than approxim
ately 40 clients. The service sits in a loop calling accept() to take incoming connections. When a connection arrives, the service calls CreateThread() to spawn a thread that is responsible for handling the client for the duration of the connection. Using a separate thread for each client has the advantage of reducing complexity, because each code path needs to perform only a single operation: The main thread accepts clients, and the child threads service them.
Programmers who have developed Unix sockets services (daemons in Unix terminology) will recognize this model as being similar to the single-process-per-client model that's often used on that operating system. In fact, it's possible to use a single process per client in Windows NT, but because processes make higher demands on resources than threads do, we do not recommend using it.
The downside of this model is that it does not scale well to large numbers of clients for two reasons: because of the demands that each thread places on resourc
es and, more important, because of the length of time the system requires to do context switching among numerous threads. It takes several CPU cycles to switch contexts between two threads, and if a process has hundreds of threads all competing for one CPU, the system spends a large percentage of its time switching among these threads.
The worker threads with synchronous I/O model improves on the scalability of the one-thread-per-client model but increases complexity and slows performance when run with a small number of clients. The service uses a primary thread to accept incoming connections and dispatch tasks to worker threads. The primary thread typically uses select() to learn when sockets are ready for service and then notifies one of the worker threads that it has a job to perform. The worker thread wakes up, services the request, and then waits for more work.
There are a number of ways in which to break down the work between the primary thread and a worker thread. The primary thread can s
imply indicate to the worker that data is available on a socket; the worker then calls recv() to get the data and processes it. Alternatively, the primary thread can do the recv() and take a first look at the data so that it can tell the worker thread what action to perform for the client.
The most powerful and flexible model--and also the most complicated one--is worker threads with asynchronous I/O. The key feature of this model is that socket handles are native file handles in Windows NT. As a result, the Win32 APIs ReadFile() and WriteFile() can be used on connected sockets, and services can exploit the asynchronous, or overlapped, ability of these APIs. In asynchronous I/O, the application makes the initial request, and the system informs the application that its request is pending, meaning that the system is still working on it. This allows a single thread to start several I/O requests and then wait for one of them to complete.
By leveraging asynchronous I/O, a single service thread can si
multaneously support several clients, but without the CPU overhead of the select() call. In addition, the service threads can handle more than sockets I/O, because the Win32 I/O mechanisms are integrated into the rest of the system. For example, a thread can wait for a semaphore or for I/O completion on seven socket handles.
Because this model lets a single thread support a number of clients, it scales well to hundreds, and even thousands, of simultaneously connected clients. It also performs well even for a small number of clients and provides the service developer with a flexible way of handling threads.
However, this model also introduces considerable complexity. For example, it raises the question of how many worker threads the service should use. If it uses too many, the system will thrash away, doing too many context switches between threads. If it uses too few, one or more CPUs may sit idle, waiting for work.
In general, it is advisable to use at least as many threads as the system
has processors. More threads should be added if any existing ones spend a significant amount of time waiting for operations such as disk or network I/O to complete. However, the service should limit the total number of threads to no more than 20 or 30, depending on how the service uses the threads.