Archives
 
 
 
  Special
 
 
 
  About Us
 
 
 

Newsletter
Free E-mail Newsletter from BYTE.com

 
    
           
Visit the home page Browse the four-year online archive Download platform-neutral CPU/FPU benchmarks Find information for advertisers, authors, vendors, subscribers Request free information on products written about or advertised in BYTE Submit a press release, or scan recent announcements Talk with BYTE's staff and readers about products and technologies

ArticlesWindows NT Threads


November 1995 / Core Technologies / Windows NT Threads

To truly reap the rewards of a multiprocessor NT system, you have to use threads

Shashi Prasad

Multithreading (MT) is becoming increasingly attractive for applications; it offers one of the best choices for harnessing the power of SMP (symmetric multiprocessing) machines. In my article "Weaving a Thread" (October BYTE), I discussed multiprocessing and MT on Solaris and Windows NT. In this article, I'll take a closer look at the Win32 interface in Windows NT for developing MT applications.

Processes and Threads

A process in NT is a running instance of an application; it has its own virtual address space and owns system resources, such as memory, windows, and open files. When a process is created by a call to C reateProcess , an initial thread is automatically built for the process. You create additional threads by calling the following function:

The newly created thread starts executing the routine specified by lpStartAddr , and this routine can take the optional argument lpThreadParm . The thread-routine argument is generally a dynamically allocated variable or a global variable. Each thread in NT has its own user and kernel stack, and the size of the stack for the newly created thread can be specified in cbStack .

Threads in NT have 32 different priority levels. The dispatcher -- the module responsible for thread-scheduling -- uses a preemptive priority scheduler. In Windows NT, the highest-priority thread is always scheduled to run. Threads can change their priority by calling the function SetThreadPriority .

NT threads can be suspended and resumed by ot her threads in the process via calls to SuspendThread and ResumeThread , respectively. You can also create a thread in suspended state, which means it doesn't start execution until the creating thread calls ResumeThread .

A thread can terminate in one of the following ways: It can return from the initial routine; it can call the function ExitThread to terminate itself; or it can be terminated by some other thread in the process that calls TerminateThread . When a thread terminates, the thread object becomes signaled -- all other threads waiting for the thread to terminate are notified. A waiting thread can determine the exit status of a terminated thread with the function GetExitCodeThread .

Each thread has a unique identifier that can be retrieved by calling the function GetCurrentThreadId (this identifier is also returned in the lpIDThread argument during thread creation). However, several Win32 functions require an obj ect's handle, which, for a thread, is separate from its ID. The handle to the thread object can be retrieved by calling the function GetCurrentThread . (The handle is also returned by the function CreateThread .) For example, when a thread wants to change its priority class, it can call the following:

Thread Synchronization

In a multithreaded program, all threads within the process run in a single address space. Threads allow easy data sharing; however, safeguards against corruption of the shared data are required. All access to shared resources must be protected by mutual exclusion.

In NT, mutexes are used to serialize critical sections of code. A critical section is defined as a segment of code in which a thread accesses shared, modifiable data, and where state changes happen over several instructions. Hence, only one thread can be executing that section at a given time. Access must be serialized by some form of locking mechanism.

Before entering the critical section, the calling thread acquires the mutex lock by calling the WaitForSingleObject function. If the lock is held by some other thread, the calling thread is suspended until the thread holding the mutex lock releases it by calling ReleaseMutex .

Critical-section objects are similar to mutexes but can be used only by threads of a single process. EnterCriticalSection is used to acquire ownership of a critical section, and LeaveCriticalSection releases ownership. This is one of the fastest mechanisms for mutual exclusion; only a few instructions are executed when there is no contention for the critical section. (If contention occurs, a kernel synchronization object is automatically used.)

In a multithreaded application, it's common to divide work among multiple threads. In such cases, one thread might wait for another thread to reach a particular state before proceeding. NT provides event objects for thread synchronization. One thread can call WaitForSingleObject , thus blocking its execution until a certain condition is satisfied. The other thread, after satisfying the condition, can notify the waiting thread by calling SetEvent .

Semaphore objects are similar to mutexes, except there is no ownership associated with semaphores. Additionally, semaphores have resource counts, which allows multiple threads to acquire a semaphore at the same time.

Finally, NT provides atomic memory operations for integer variables. The functions InterLockedIncrement and InterLockedDecrement increment and decrement a variable, respectively, while the function InterLockedExchange reads the value of a variable.

Threads and Performance

Once you've grasped the basic concepts of NT threads, you need to consider the performance and scalability of threaded applications. A thread in NT can normally be in one of the following states at any given time: waiting for a specified event to occur (it cannot run); ready to run and waiting for an available processor; or running on a processor.

Threads in the ready or running state can take advantage of the CPUs (presuming they are running on a multiprocessor system). Excessive interthread synchronization can cause too many threads to be in the waiting state, and the creation of too many threads can cause multiple threads to be in the ready state. The number of threads in the running state can never be more than the number of processors. When the number of threads in the ready state is much higher than the number of running threads, the kernel spends a lot of time doing thread-context switching.

As an illustration, consider the various threading models used in the design of a multithreaded TCP/IP server. (This assumes you're familiar with Windows socket APIs on Windows NT.)The first model is single-threaded. The main thread do es an accept call on the socket and handles the client request. The disadvantage of this model is that while the server is processing a client request, all other requests are being queued.

The second model is also single-threaded: The main thread does a select call on all the connected sockets. The select call indicates which connected sockets have data available (i.e., they are waiting to be serviced). Now multiple clients can be serviced concurrently, but -- as in the previous model -- this does not exploit the power of multiple CPUs.

In the third model, the main thread creates a thread for each client. This model is extremely easy to program, but it does not scale well for a high number of active clients. Creating multiple threads takes advantage of multiple processors but uses excessive system resources and causes scheduling overhead. The performance of the system degrades under "burst" traffic. As the number of ready threads increases, the system spends lots of time context-switching threads in and out of the running state.

Finally, in the fourth model, a pool of worker threads is created to handle client requests. The main thread does a select on all the connected sockets; each new request gets passed to one of the worker threads. The number of worker threads should be slightly greater than the number of processors, because some of the worker threads might become blocked.

This model uses less system resources than the third model, but there's a built-in context switch on every transaction between the main thread and the worker threads. The context switch might not be a problem for longer transactions, but the overhead could be high for short transactions. Also, unless the main thread does some rotation on the results of the select call, this model does not have built-in fairness (i.e., an active client may block other, less active ones).

I/O-Completion Ports

To overcome the limitations of these four models, the engineers of NT 3.5 created a mec hanism called I/O-completion ports . These ports are designed to handle asynchronous or overlapped I/O. CreateIoCompletionPort associates a port with a collection of file handles, and the port acts as a synchronization point . When a pending I/O operation on any of the file handles completes, an I/O-completion packet is then queued to that particular port. A number of worker threads can manage I/O for clients by calling GetQueuedCompletionStatus to wait on the I/O-completion port.

I/O-completion ports have built-in concurrency control. The kernel tries to limit the number of runnable threads associated with a port, never to exceed the port's concurrency value (which is specified when the port is created). When a thread calls GetQueuedCompletionStatus , it returns when I/O is available. When one of the threads associated with a completion port is blocked, the kernel selects another thread waiting on the completion port to run. Thus, the system is n't deluged with runnable threads.

Threads that block on a completion port are awakened in last-in/first-out (LIFO) order, while I/O requests are serviced in first-in/first-out (FIFO) order. Running threads -- after completing a transaction -- can pick up the next request without causing any context switch. I/O-completion ports work efficiently under all loads; their performance does not suffer under heavy traffic.

If my sample TCP/IP server were implemented using I/O-completion ports, the main thread would create an I/O-completion port along with a pool of worker threads to wait on the port. This model is the most efficient; it does not suffer from context-switching overhead (as the fourth model would). The thread that reads the transaction services it. Fairness is built into the completion-port model, since I/O requests are satisfied in FIFO order.

The Common Thread

MT on an SMP machine can provide optimal performance and scalability if the applications are designed correctly. You should not be surprised to see poorly designed applications run slower on an SMP machine than they do on a uniprocessor machine.Windows NT is a good environment for developing multithreaded applications, but it's important to remember that the OS alone is not responsible for performance and scalability. Understanding such features as I/O-completion ports and overlapped I/O are key to building scalable multithreaded applications on Windows NT.


Synchronizing Threads


Mutex  -- 
 Serializes access to shared data.


Critical-section object  -- 
 Faster than a mutex; cannot be shared
across processes.


Event object  -- 
 Used to signal occurrence of an event.


Semaphore  -- 
 Controls multithreaded access to a shared but limited
resource.


Interlock call  -- 
 Provides atomic access to integer variables.



Shashi Prasad is vice president shaship@anstec.com or on BIX c/o "editors."

Up to the Core Technologies section contentsGo to previous article: Open(ing) VMS to Win32Go to next article: ATM with a Twist of LANSearchSend a comment on this articleSubscribe to BYTE or BYTE on CD-ROM  
Flexible C++
Matthew Wilson
My approach to software engineering is far more pragmatic than it is theoretical--and no language better exemplifies this than C++.

more...

BYTE Digest

BYTE Digest editors every month analyze and evaluate the best articles from Information Week, EE Times, Dr. Dobb's Journal, Network Computing, Sys Admin, and dozens of other CMP publications—bringing you critical news and information about wireless communication, computer security, software development, embedded systems, and more!

Find out more

BYTE.com Store

BYTE CD-ROM
NOW, on one CD-ROM, you can instantly access more than 8 years of BYTE.
 
The Best of BYTE Volume 1: Programming Languages
The Best of BYTE
Volume 1: Programming Languages
In this issue of Best of BYTE, we bring together some of the leading programming language designers and implementors...

Copyright © 2005 CMP Media LLC, Privacy Policy, Your California Privacy rights, Terms of Service
Site comments: webmaster@byte.com
SDMG Web Sites: BYTE.com, C/C++ Users Journal, Dr. Dobb's Journal, MSDN Magazine, New Architect, SD Expo, SD Magazine, Sys Admin, The Perl Journal, UnixReview.com, Windows Developer Network