DISTRIBUTED-MEMORY ARCHITECTURE
A computer network is one form of distributed-memory machine. A better approach uses a single box for up to 1000 processors, with hardware for fast communication. Key issues include bandwidth, latency, topology, network interface, and communications/computation overlap.
This is the most popular approach. Thinking Machines' CM5 Connection Machine has a network of SPARC processors that can theoretically yield 100-plus MFLOPS performance. Others include the Cray T3D, with DEC Alpha processors, and the Intel Paragon, with 860s.
VLIW MACHINES
VLIW (very long instruction word) machines have many functional units (e.g., floating-point adders and multipliers). Where superscalar chips have a couple of these units, VLIW machines have dozens. Each instruction can have up to 1024 bi
ts, with many small subfields that tell a unit what to do. It's up to the compiler to keep all the units busy. Multiflow Computer built and marketed a major VLIW design in 1988. The company developed many interesting compiler techniques, but it went out of business in 1991.
SHARED-MEMORY ARCHITECTURE
Attaching all processors and memory to a shared bus creates a single address space. Memory location 1000 on each CPU refers to the same piece of storage. Programmers don't need to send data between CPUs.
Caching is required for performance, but intelligent caches are needed to work correctly. Consider the following example: Processor A reads a memory location and caches the data; processor B writes a new value to that same location. "Snoopy" caches, which monitor the bus, keep A from using the outdated value.
Adding processors means you eventually run out of bandwidth, so shared memory has lost popularity except for speedy desktop machines. Newer operating systems--including OS/2 and Windo
ws NT--are being extended to support a few processors. Most shared-memory machines to date (e.g., the Sequent) run Unix.
DATA-PARALLEL ARCHITECTURE
This unusual machine has many small, limited processors that work together in lockstep. A central unit broadcasts a command to each, and they all execute together. The best-known data-parallel machine is the 1987-vintage, 64,000-processor CM2 from Thinking Machines.
Figure: Distributed-Memory Architecture
Figure: Very Simple VLIW
Figure: Shared-Memory Architecture
Figure: Data-Parallel Architecture