use the market is too small. Parallel computers have stayed stuck in research laboratories, which normally write and maintain their own software anyway. That's why parallel computers have failed to really penetrate the industrial sector, where users expect a reasonable amount of service and technical support.
Europort, a European Commission-funded project that started in 1994, addresses this vicious circle by porting widely used industrial-design applications to a variety of parallel computers. The results of the first two project phases, Europort 1 and Europort 2, which ended late last year, are encour
aging. They have shown that the performance increases more than justify the cost and effort of porting.
This parallelization initiative has given a much needed boost to the European software industry. "Although U.S. firms may lead in building parallel hardware, they are failing to exploit it with effective software," says Dr. Stephen Brindle of Smith System Engineering, one of the managers of the Europort project. "Thanks to Europort, we in Europe are now ahead in creating commercial parallel applications."
Europort's reference platforms include symmetrical multiprocessor systems, distributed-memory (massively parallel) multicomputers, and networks of RISC workstations. The 30 Europort consortia ambitiously aimed to port 38 industrial applications to at least two of the hardware reference platforms. Though the majority of these applications fall within the traditional high-performance computing provinces of mechanical engineering, fluid dynamics, and computational chemistry, the more unusual applicati
ons included cartoon animation, simulation of traffic flow, and radiation therapy simulation.
All Europort projects worked with the message-passing programming paradigm, which describes the cooperation of different processors without shared memory by exchange of messages on some sort of channel (e.g., FDDI, transputer links, Ethernet). These message-passing processes were implemented via the portable parallel code libraries PVM, PARMACS, and MPI.
PVM (for Parallel Virtual Machine), a public-domain library of message-passing primitives, enables a heterogeneous network of workstations to simulate a single parallel computer. There are implementations of PVM for most flavors of Unix, for OS/2, and now for Windows, too (see "Parallel Computing Windows Style" May '96 BYTE). PARMACS (for PARallel MACroS) was originally implemented as FORTRAN 77 macros and is now, in version 6.0, also available to C and C++ programs. The Message Passing Interface (MPI) is emerging as a standard that's being promoted by the ma
in U.S. vendors of parallel hardware. These libraries include node process creation and synchronous and asynchronous communications. They enable programs to communicate between different types of processors, which may even be running different operating systems. By providing a hardware-independent message-passing interface, such libraries allow the same piece of parallelized code to run on each of the Europort reference architectures.
A major attraction of these API libraries is that programmers can call them from the standard programming languages used in the industrial computing world, notably FORTRAN 77, FORTRAN 90, C, and C++, so that no unfamiliar languages need be introduced. (The exception was the LP2-Erlang consortium, which ported the parallel functional language Erlang, used to program and simulate large telecommunications networks.)
No Magic Bullet
Although parallel programming research has worked on it for years, there is no "magic bullet" procedure for porting serial
applications to a parallel platform. Each Europort consortium had to study the structure of serial algorithms and data sets and then deduce efficient ways to decompose and run them on multiple processors.
The Europort programmers generally agreed that the adoption of generic message passing was a good idea -- it saved coding time compared to using the hardware vendors' proprietary libraries. Interestingly, many of the consortia found that even on the shared-memory computing platforms, message-passing implementations of their code exhibited much better scaling behavior than native memory-sharing versions.
The difficulties of parallelization within the Europort consortia varied as widely as the application domains involved. The MaxHom team, for example, worked on 3-D modeling software for protein design and was able to adopt a coarse-grained parallel implementation with relatively small changes to the serial code and using very limited interprocessor communication. This kind of coarse-grained single-pro
gram, multiple-data (SPMD) approach proved to be successful for other consortia, too, and it was often implemented using a master-slave paradigm where a single process spawns child processes onto separate processors at run time. Each child process performs part of the calculation and then passes it back to the parent for consolidation into the final result. This technique makes initializing and launching the application much simpler, as it involves loading only one node, and all output takes place through a single node.
Step-by-Step Approach
The Free University of Amsterdam used this technique to parallelize its ADF (Amsterdam Density Functional) program, which is used by chemists for predicting molecular structure. The first prototype of the program loaded each node with the same code and the same data, so that the nodes all duplicated each others' work and the program remained a serial one. The programmers then gradually introduced more parallelism and reduced the serial part by par
titioning the data onto the different CPUs.
This step-by-step approach meant there was always a running version of the code. In the words of the programmers' report, "this not only made debugging much easier, but also the determination of the most time-consuming serial parts that still remained to be parallelized." Another advantage is that the structure of the parallel program is virtually identical to the structure of the serial program. Because of these similarities, it is much easier to maintain and extend the functionality of the parallel and serial versions simultaneously.
In some cases the effort of parallelization even caused the programmers to examine and improve the algorithms employed in the original code. For example, the LINPARC2-Traffic consortium ported a traffic-flow simulation application and found that its version of the algorithm showed poor convergence behavior when parallelized. However, the programmers rewrote the main algorithm during parallelization, and applying this to the se
rial code, too, speeded it up by four to six times.
Break-Even Point
All Europort consortia achieved worthwhile performance improvements of their code through parallelization. However, according to Clemens-August Thole of GMD/SCAI, the managers of Europort 1, "The final judgment on the real economic impact of Europort can only be made two years from now." A cost/benefit analysis performed by Europort 2 management indicates that all the consortia expect to have recouped their costs by 2001, while most will be showing a profit by 1998 or 1999. A lucky few, like the owners of Animo and PAM-Crash, were already into profit before Europort was completed.
The primary benefit for Europort's code owners is that the improved performance of the code makes it easier to sell. For the code users there are several benefits. Faster code may translate directly into time and money savings, as with Animo's fivefold reduction in rendering time. But on the other hand, users can also achieve more preci
se results by performing more calculations in the same time, as is the case with RAPT, the RAdiotherapy Treatment Planning System that uses Monte Carlo methods.
In some cases the time saved by parallelizing software will make economically feasible a procedure considered impossible on serial computers. This is the case, for example, with PEPSE's program for analyzing lightning strikes. Perhaps most important of all in an industrial context is that parallelization enables corporations to exploit existing computer resources more effectively -- for example, using the office network during the night to perform long rendering jobs.
In many respects the biggest lesson to be learned from Europort is that sometimes it takes an outside initiative to give a kick to a market locked into a vicious circle. This lesson is sufficiently compelling that ARPA, the U.S. defense research organization, now plans to emulate Europort in the States. ARPA will adopt Europort's three-part consortium structure: owners of the app
lication code, industrial end users, and a parallel processing expert. Most participants believe this approach was crucial to the projects' success since it kept the efforts focused on usability and commercial viability rather than on the theoretical dogmas that have plagued the parallel computer community.
Where to Find
GMD/SCAI
St. Augustin, Germany
Phone: +49 2241 14 2330
E-mail:
europort@gmd.de
Internet:
http://www.gmd.de/SCAI/europort/