Archives
 
 
 
  Special
 
 
 
  About Us
 
 
 

Newsletter
Free E-mail Newsletter from BYTE.com

 
    
           
Visit the home page Browse the four-year online archive Download platform-neutral CPU/FPU benchmarks Find information for advertisers, authors, vendors, subscribers Request free information on products written about or advertised in BYTE Submit a press release, or scan recent announcements Talk with BYTE's staff and readers about products and technologies

ArticlesA Closer Look at Two Supercomputers


February 1995 / Features / The Grand Challenges / A Closer Look at Two Supercomputers

Current supercomputers can be roughly divided into two categories: vector machines and massively parallel machines. The key distinction between the two is that almost all vector supercomputers can be purchased with multiple processors, but parallel supercomputers are dependent on using many processors at once to deal with a single problem.

Parallel machines rarely provide enough performance to handle a grand-challenge application using only one processor at a time. Vector machines, on the other hand, are almost exclusively used as a group of independent processors that share resources. A very small percentage of the applications currently running on vector machines use more than one processor at once.

A Vector Supercomputer: The Cray C90

The C90 comprises a family of related machines, the most powerful of which, the C916, can have between eight and 16 processors. It has a clock cycle of 4.2 nanoseconds; the 15-ns memory is implemented on BiCMOS. A C916 system can have as much as 8 GB of memory.

During each clock cycle, two operands can be loaded from memory, and one can be stored for each pipeline. But due to the latency of the memory subsystem, memory operations must be scheduled properly to achieve maximum throughput. (For applications that require more memory, Cray offers an alternate line called the M90; these systems have lower floating-point performance but can support several times more memory.)

The maximum I/O bandwidth of the C90 is 13.6 GBps; it's handled by a variety of networks. The system can be connected to a solid-state disk (i.e., a large RAM drive) that stores up to 32 GB and supports access at the full I/O bandwidth. Physically, the machine takes up 48 square feet, and the Freon cooling unit requires another 50 square feet. The system can require more than 300 kilowatts of electrical power to run.

The core of the C90's floating-point performance, which peaks at about 1 GFLOPS per processor, comes from the vector processors. It's up to the programmer and the compiler to see that those processors are used effectively. Over the past few decades, scientific programmers have become used to programming for vector supercomputers and have learned how to write efficient code for them. Although it is rare to have code achieve a sustained throughput of anything close to 1 GFLOPS, a lot of real-world applications achieve hundreds of MFLOPS.

The C90's operating system is UNICOS, a Unix variant. The system comes with highly tuned compilers for various languages (including C and FORTRAN 77). Cray has also built a variety of tools for measuring the performance of an application and discovering inefficiencies or hot spots that need to be optimized.

A Parallel Supercomputer: Intel's Paragon

The Paragon is a descendant of earlier Intel machines. Intel began building parallel hypercube systems during the mid-1980s and then moved to a two-dimensional mesh with its Touchstone Delta.

The Paragon is similar in design to the Delta, but it uses faster, 50-MHz i860/XP processors with built-in support for network communications. Each processor can have up to 128 MB. Routing communications between processors through the mesh is handled by separate network chips; the bisection bandwidth ranges from less than 1 GBps all the way up to several GBps, depending on the machine's configuration.

I/O is performed through a HiPPI (High-Performance Parallel Interface) that supports up to 100 MBps. For comparison, the Cray C90 supplies not only a HiPPI but also a variety of other interfaces that can support as much as 1.8 GFLOPS per channel.

I/O performance is often an Achilles' heel for parallel machines. This is particularly tru e of a system like the Paragon, which can be configured to provide much higher theoretical CPU performance than even the biggest Cray vector machine.

For some applications, the Paragon attains extremely high performance. For instance, a 3680-processor Paragon achieves 143 GFLOPS on the LINPACK benchmark, as compared to just 13.7 GFLOPS for a 16-processor C90.

But achieving such performance on a massively parallel machine is difficult. At present, virtually every application that executes efficiently on massively parallel systems is hand-coded; the programmer directly specifies the data that is to be communicated between nodes using message-passing primitives. Intel provides libraries of tuned communications routines and tools to aid in performance monitoring and debugging on the Paragon, but the process is far from painless.


Cray C916

photo_link (17 Kbytes)


Up to the Features section contentsGo to previous article: The Supercomputer MakersSearchSend a comment on this articleSubscribe to BYTE or BYTE on CD-ROM  
Flexible C++
Matthew Wilson
My approach to software engineering is far more pragmatic than it is theoretical--and no language better exemplifies this than C++.

more...

BYTE Digest

BYTE Digest editors every month analyze and evaluate the best articles from Information Week, EE Times, Dr. Dobb's Journal, Network Computing, Sys Admin, and dozens of other CMP publications—bringing you critical news and information about wireless communication, computer security, software development, embedded systems, and more!

Find out more

BYTE.com Store

BYTE CD-ROM
NOW, on one CD-ROM, you can instantly access more than 8 years of BYTE.
 
The Best of BYTE Volume 1: Programming Languages
The Best of BYTE
Volume 1: Programming Languages
In this issue of Best of BYTE, we bring together some of the leading programming language designers and implementors...

Copyright © 2005 CMP Media LLC, Privacy Policy, Your California Privacy rights, Terms of Service
Site comments: webmaster@byte.com
SDMG Web Sites: BYTE.com, C/C++ Users Journal, Dr. Dobb's Journal, MSDN Magazine, New Architect, SD Expo, SD Magazine, Sys Admin, The Perl Journal, UnixReview.com, Windows Developer Network