It divides and conquers rendering by drawing the objects first, then combining them
Dick Pountain
PixelFlow is a new chip architecture for generating photo-realistic images in real time. Created by researchers at the University of North Carolina, PixelFlow is aimed squarely at the very top end of the graphics market, presently dominated by such exotic hardware as Silicon Graphics' RealityEngine2 (see "Damn the Torpedoes!," November 1993 BYTE). To compete in this arena, a system must offer full color (with fog and transparency effects), high resolution, photo-realistic Phong shading, antialiasing, and bump- and texture-mapping, all while delivering up to 60 video frames per second.
The sort of custome
rs who can afford this high level of realism right now include Hollywood studios chasing another
Jurassic Park
-class blockbuster film, and military departments. The latter is looking into PixelFlow for use in sophisticated VR combat simulators (research on PixelFlow has been partly supported by the U.S.'s Defense Advanced Research Projects Agency).
PixelFlow shares some of the same design decisions that went into the RealityEngine2. First, it employs massive parallelism, since there's no way to get sufficient bandwidth from a single processor. Second, it separates the floating-point-intensive geometry processing from the rendering process, which uses high-performance integer calculations. This is explained by the fact that geometry processing works on a database of real numbers -- the 3-D coordinates that describe objects -- using matrix math. These calculations transform this data into a current screen view composed of triangles. Conversely, rendering works on bit maps at the pixel level, colo
ring in the triangles to give the illusion of a scene with lighting and shadows.
Where the PixelFlow architecture differs from its competitors is that it's fully scalable in both the geometry and the rendering dimensions. It splits the rendering process itself into separate rasterizing and shading steps (so-called "deferred shading") that are implemented on separate boards. A PixelFlow system consists of a backplane into which you can fit the appropriate mixture of rendering and shading boards to achieve the performance you require (see the figure
"PixelFlow Hardware"
).
Subdivision vs. Composition
Key to PixelFlow's scalability is the way it distributes tasks among its parallel processors, using a technique called
image composition
. The more obvious way to distribute graphics tasks is through
screen subdivision
, where you divide up the screen into a number of nonoverlapping regions and assign a different renderer to each region. T
his approach scores on conceptual simplicity but complicates implementation. Most of the objects in any scene will cross region boundaries, and they'll also move around the screen as the viewpoint changes. Hence, a global routing network is needed so that any geometry engine can deliver objects to any renderer (see
section (a)
of the figure
"Imaging Architectures Compared"
). This network needs a very high bandwidth and, crucially, the bandwidth required rises with the rendering rate. Having to sort the drawing primitives by screen region also complicates software implementation.
In image composition, by contrast, each graphics processor works over the whole screen area but renders only some of the primitives (see
section (b)
of
"Imaging Architectures Compared"
); the partial images so produced are then composited (hence the name) into a single picture. This does away with the global switching network completely, as each g
eometry engine always serves the same renderer. Instead, you need a compositing network to combine the partial images; however, as this carries only local traffic from each renderer to its neighbor, its necessary bandwidth (though still high) remains constant as you add more renderers. Therefore, you have scalability.
Composition-network bandwidth depends on the screen size and frame rate. Say we want a 1280- by 1024-pixel image that's 48 bits deep, displayed at a 30-Hz frame rate. This amounts to a base bandwidth of 2.3 Gbps. In PixelFlow, however, antialiasing requires each pixel to be super-sampled four times over, and the deferred shading algorithm multiplies the required bandwidth further by a factor of 2 or 3. The architecture that handles these operations is based on a 256-bit-wide image-composition network running at 132 MHz, for a total of 33.8 Gbps. The very name PixelFlow stems from the fact that a stream of pixels flows in one direction along this compositing data path, with each rendering b
oard adding its own contribution to the image as it passes through.
The compositing port on each board contains hardware comparators that implement a z-buffering scheme, so only those pixels that should be visible (that is, those pixels that have a smaller
z
coordinate and so are in front of any other pixel) pass over the network to the next renderer. This comparison hardware works in parallel, so that a board can be rendering the next frame while the current one is passing over the network.
"Deferred shading" implies that the renderers don't generate final RGB values. They generate an intermediate pixel representation that encodes attributes such as intrinsic color, direction of surface normals, and texture coordinates. These intermediate values flow into the shaders (all located downstream from the renderers) where they are converted to RGB using shading and texturing algorithms. The last station on the network is the frame buffer, where the final composited image accumulates.
PixelFlow Hardware
At the heart of the PixelFlow architecture is a processing array of enhanced memory chips (EMCs), which you can think of either as a single-instruction multiple-data parallel computer or, equivalently, as an intelligent memory. This EMC, for which first silicon is due this quarter, is the latest in a family of UNC designs called Pixel-Planes. It's a 16 by 16 array of cells, each of which consists of an 8-bit ALU, a linear expression evaluator, 2048 bits of local memory to store pixel data, and an 8-bit memory bus. The linear expression evaluators compute the value of bilinear expressions of the form Ax+By+C for each cell's
x,y
coordinates. The ALUs perform arithmetic (including 16-bit multiplies) and logic operations on the cell's pixel data, in parallel across the whole array.
Programming involves broadcasting the parameters A, B, and C for each of the three corners of a triangle to the whole array, along with an ALU micro-instruction specifying an o
peration. All the cells compute in parallel whether their own pixel lies within this triangle (A1x+B1y+C1<0, etc.) and perform the requested operation if the pixel does.
This EMC array is equally useful for rendering or shading operations, so all PixelFlow boards contain the same core components: a floating-point RISC geometry processor and an array of EMC chips. The EMC array is dual-ported. One port connects to the composition network. The second port is used differently, according to the type of board: On a renderer board it's unused; on a shader board it connects to eight banks of RAM used for storing texture maps; on a frame-buffer board it connects to the frame buffer itself. Since shading and frame-buffer boards perform no geometry calculations, their underused RISC processors can be redeployed for image-processing operations, such as warping, and for procedural texturing (running Renderman, for example.)
While it would be nice to provide enough EMCs to render the whole screen in paralle
l, this would currently be prohibitively expensive, as you'd need 5120 (80 by 64) EMCs per board. Real PixelFlow systems will use just 64 EMCs per board and render the screen as a time sequence of 128- by 128-pixel regions. This image sequencing, and that associated with antialiasing sampling, is performed by a custom control ASIC.
PixelFlow is now poised to enter the commercial arena. Division Inc. (Bristol, U.K.), a firm specializing in virtual reality systems, has formed a partnership with UNC to develop a PixelFlow board set, and Hewlett-Packard has invested in this project with an option to use the system in future graphics workstations. Consequently, Division has chosen HP's own PA-7200 (250+ SPECfp92) as the geometry processor (to be replaced when available by the PA-8000). The performance target -- for a system with 128 PixelFlow boards -- is 100,000,000 Phong-shaded, antialiased, textured triangles per second.
illustration_link (13 Kbytes)

Screen-subdivision designs require a high-bandwidth network to relay image data; the image-composition design relays only the data for the object it has rendered and composited.
illustration_link (11 Kbytes)

The PixelFlow chip uses a common set of hardware to assemble an image. To hand
le a larger image simply means adding more boards.
Dick Pountain is a BYTE contributing editor based in London. You can reach him on the Internet or BIX at
dickp@bix.com
.