or office and interactive jobs unleashed during nonprime time for scientific and mathematical computing jobs.
PVM is a flexible message-passing environment that allows the programmer to build applications based on the multiple processor, multiple data (MPMD) paradigm. An application consists of several functional components that run in parallel. Processes of a PVM application cooperate with each other by sending and receiving messages. Though each PC may "belong" to only one virtual machine, each virtual machine can
execute several
PVM applications simultaneously.
WPVM supports dynamic process creation and virtual-machine management. It offers basic point-to-point, as well as one-to-many, communications. Like the original PVM, the WPVM programming model supports the concept of
group
. A process belonging to a speci
fic group can synchronize and broadcast messages to all the other group members. Obviously, a process can belong to several different groups.
Daemons on the Loose
WPVM consists of a programming library and a daemon that runs in each PC. The daemon is responsible for communication and process management. This daemon has an error-message log window and a graphical console that enables you to execute the original PVM console commands. WPVM supplies routines that enable a Windows process to enroll and leave the virtual machine, to configure the virtual machine by adding and deleting hosts, to spawn and kill processes, and to get information about the actual configuration and the running processes. A WPVM process can ask the virtual machine to be notified about special events: tasks exiting, deletion or failure of a host, and addition of a host.
The communications library of WPVM offers routines that allow a process to send messages to other PVM/WPVM processes, either one-to-one or o
ne-to-many (i.e., multicast). Processes can send messages either directly (via TCP) or through the WPVM daemon. (Note that WPVM hosts can interact with Unix hosts running PVM.)
A process can specify one of two coding methods to transmit the data: raw or via external data representation (XDR). If a process specifies raw coding, this implies that the receiving process has an identical data-format architecture. On the other hand, XDR coding translates the data into a kind of universal format. The receiving end will properly decode the XDR-format data; this allows, for example, systems with different types of processors to be able to correctly exchange data.
Processes in WPVM also understand the concept of groups. Groups simplify the development of libraries and the collective synchronization and communication between a set of processes. There are primitives to let a WPVM process join or leave a named group, synchronize processes in the same group, and broadcast messages to group members. A group daem
on that is unique to the entire virtual machine implements group functionality.
Design Philosophy
PVM offers to the user a kernel of functionally complete primitives above which programmers may add higher abstraction layers. For example, the portion of WPVM that implements groups is layered on top of the core WPVM routines.
Communication between different WPVM daemons and between daemons and user processes is via UDP/TCP sockets. In particular, WPVM uses the BSD sockets variant available on 16- and 32-bit Windows platforms, Winsocket.
A host table, describing the configuration of the actual virtual machine, is maintained on each host system. These tables are issued by the master WPVM daemon and kept synchronized across the virtual machine.
Each daemon maintains a task table of all the tasks under its management. Because WPVM uses UDP sockets to communicate between daemons, there's the possibility that packets can be lost, duplicated, or delivered out of order. Cons
equently, WPVM incorporates its own acknowledgment and retry mechanism. (We chose UDP sockets as the communication mechanism between remote daemons for scalability's sake.)
A WPVM daemon shuts down when it loses contact with the master, is deleted from the current virtual machine, or is killed. Before dying, the slave daemon kills any tasks running in its host and informs the other daemons listed in the host table.
Provision for Dynamic Environments
PC clusters are a very unstable environment because users can reboot or switch off their machines at any time. We anticipate this will happen often: Users will probably kill the WPVM daemon when they want full control of their machines.
This issue is of paramount importance; it implies that fault-detection and recovery mechanisms are vital for the success of a WPVM application. Consequently, WPVM daemons have time-out capabilities while they are communicating with each other. If a daemon times out when it's trying to communic
ate with another daemon, it assumes the peer is down and terminates any outstanding operations with that peer. A daemon also can possibly notify tasks that are interested in that event. A WPVM daemon is able to recover from the loss of any remote daemon (except for the master daemon, which acts as a kind of central coordinator).
The Slowest Machine Hurts the Most
The
illustration shows
the performance results for WPVM running atop a TCP/IP stack on Windows 95. We used a cluster of Pentium-based systems, each with 16 MB of RAM, connected through a 10-Mbps Ethernet network. The table also shows the same tests running in PVM on a collection of Unix systems. The Unix cluster consisted of a Sparc 10 and a Sparc 5 running SunOS and two Alpha-based stations running OSF. Each system had 64 MB of main memory. For tests, we used two of the Numerical Aerodynamic Simulation (NAS) benchmarks developed at NASA Ames Research.
--
Quick Sort: This benchmark
sorts N keys in parallel. It tests both integer computation speed and communication performance. In our test there are N=220 keys in the range [0, 2048].
--
Embarrassing Parallel: In this benchmark, two-dimensional statistics are accumulated from a large number of Gaussian pseudorandom numbers, which are generated according to a particular scheme well-suited for parallel computation. This problem requires almost no communication.
You can see that workstation or PC clusters running PVM are not suitable to solve problems like the Quick Sort benchmark. Communication-intensive applications do not accelerate in these environments because communication becomes a bottleneck. However, in applications with a good computation/communication ratio, like the Embarrassing benchmark, we can expect very promising results.
Note that with PVM for Unix, when we used two Alpha workstations, we observed a reduction in program execution time. However, when we used all four Unix machines (two Spar
cs and two Alphas), we recorded a slower execution time than with the sequential version. The explanation is quite simple: The Sparc 5 is considerably slower than an Alpha workstation. Consequently, the performance of the slowest machine undermines the total execution time of the parallel version. This illuminates the fact that the benchmark does not have any load-balancing capabilities, making computation performance highly dependent on the slowest machine.
Cost-Effective Parallel Computing
WPVM is a good teaching tool for parallel programming because it allows students to use the network of PCs they're likely to have in their laboratories already. Considering the millions of dollars invested in personal computers, it represents a cost-effective solution for performing scientific parallel computing chores. Given that PVM is already a widely used system for parallel computing, the adoption of WPVM by Windows programmers will increase PVM's support.
Acknowledgment
The author acknowledges the support of João Gabriel Silva, a professor in the Informatics Engineering Department at the University of Coimbra in Portugal.
WHERE TO FIND
WPVM is available on the Web at:
http://student.dei.uc.pt/wpvm
.