Memory chips are getting faster and more capable but not less expensive. BYTE sorts out the new types of RAM--what each is good for, and which ones you're likely to see on your next PC.
Peter Wayner
Until recently, computer users could proudly wear T-shirts proclaiming, "Whoever dies with the most RAM wins." Complacent in their swivel chairs, they knew that they understood the place of memory in the computer world order.
Now, however, the realm of computer memory is becoming fragmented as RAM producers develop increasingly ingenious methods of squeezing ever more performance out of the chips. Some of their experiments involve on-board caches, specialized CPU-to-RAM interfaces, and even pure processing power directly on the memory chips themselves. Soon a desktop system will be judged on not only how much memor
y it has but also what architectural enhancements that memory offers to the system.
The force behind these changes has been increasingly apparent over the last decade: RAM-chip speeds have not kept up with the increasing speeds of CPUs. As chip manufacturers found ways to add more transistors to each chip, the CPU architects could easily increase speed by adding more arithmetic units to each chip. Twice as many arithmetic units could let a chip run almost twice as fast. On RAM chips, though, more transistors per chip simply meant that the chips could hold more data; they couldn't access it particularly faster. While faster chips (e.g., static RAM, or SRAM) were available, they were much more expensive and had lower capacity per chip because they required several more transistors to store each bit.
Working around the slow speed of RAM, CPU architects added a small amount of faster cache memory and created separate memory banks that would alternate serving data to the CPU. These techniques, though
, are reaching maturity. Intel's P6 includes primary and secondary caches on the main chip die. Many systems augment this by placing a tertiary cache between the processor and the main memory; however, the gain that each new level of cache produces is substantially smaller, as absolute gains in access speed are offset somewhat by the need to look in yet one more place (see "Is Cache Losing Its Cachet?").
This places more pressure on RAM designers to deliver information faster. The latest designs incorporate small quantities of extra logic that organizes the flow of data off the chip. These chips--known by names like FPM (fast page mode), EDORAM (extended data out RAM), or burst-mode RAM--offer faster data flow when the data is requested in sequential order.
More exotic alternatives are emerging in the graphics arena, where innovation is more common because the standard unit is the video card, not the SIMM. This lets card manufacturers experiment with and adopt different technologies and still pr
oduce a board that works with all software. These card designers are exploring the use of FPM, EDORAM, and burst-mode RAM, and they're also investigating technologies like Mitsubishi Electronics' 3D RAM and Samsung's WRAM (Window RAM), both of which include more circuitry on the RAM chip designed to speed up common video operations. (For more on graphics RAM, see "Faster Graphics Cards on the Horizon," April BYTE.)
Systems designers are beginning to explore a greater range of nonvolatile memory. Flash memory is an electrically alterable memory that maintains its state until it is erased with a relatively large voltage. Originally developed by Intel, flash RAM has seen wide use in PCMCIA PC Card memory for laptop computers. (For an update, see "Flash Memory Looks Bright.") FRAM (ferroelectric RAM), a newer competitor in the nonvolatile arena, uses iron in the fabrication process, an echo of the old ferrite-core memory used in early mainframes. FRAM also maintains its memory after the power stops. Both t
hese memory products should find increasing use in highly portable products like PDAs (personal digital assistants).
In the farthest regions of the intellectual frontier, memory designers are experimenting with memory that can do many computations directly on the chip (see "Smart Memory"). They expect this will offer dramatic performance improvements on problems that can be solved in parallel. Placing the intelligence on the chip saves the time of moving the data on and off the chip.
The RAM Shift
The computer memory industry doesn't jump on these new innovations quickly, because it produces chips in such high volumes. No memory manufacturer can begin to build new formats or innovations until the demand is there, and the demand is not certain until system enhancements become essential.
By the time this article appears, the computer industry will be engaged in a large-scale adoption of EDORAM for main memory. In fact, this year promises to be the one in which EDORA
M begins to dominate the marketplace. Many system integrators and RAM manufacturers agree that the price differential that EDORAM commands will evaporate by early 1996.
The technology behind EDORAM is a simple extension of the trend that made FPM RAM the standard form of DRAM available. When you read an element in a DRAM array, you charge electrical lines to first select a row and then a column. The lines do not stabilize immediately, however, and the length of the delay is what prevents the RAM location from being read instantaneously.
FPM RAM returns data faster because it assumes that the next data requested will be in the next column of the same row. In many cases this happens, and there's no need to wait for the row delay. This process stops working reliably, however, if the CPU demands data too quickly. The lines do not stay stable long enough for the CPU to read off the answer. This usually begins to happen in CPUs running at speeds faster than 33 MHz.
To solve this problem, EDORAM
adds a set of latches, or secondary memory cells, at the output. These sense the data being fed for output, store it, and keep it available long enough for the signals to reach the CPU reliably. These chips should be stable at system bus speeds up to 50 MHz.
You can add even more circuitry and let EDORAM offer information at still higher speeds. An approach known as burst EDORAM assumes that the CPU wants the next, say, four addresses and begins fetching them automatically. This technique can easily supply systems with a bus clock speed of up to 66 MHz reliably.
The Race Is On
CPUs are getting faster, however, and they demand ever faster memory. Many memory makers are investigating two solutions: synchronous RAM, in which the CPU and RAM are locked together by the same clock; and cache RAM, which gains speed by adding to the chip a small amount of fast SRAM that acts as a cache to the DRAM. Both are good choices for systems that run faster than 66 MHz.
The synchr
onous solution is a cleaner replacement for the old interfaces between chips. Normally, memory chips answer requests. SDRAM (synchronous DRAM) feeds off the same clock cycle as the CPU, anticipating the CPU's demands and staying in step. Some devices even have a pipelined architecture, in which a stage can fetch an address while other stages present the data for output. Many people predict that 1996 will be the year of SDRAM, because 66-MHz or higher CPUs will be common by then and will need SDRAM. Until then, however, SDRAM will command a 20 percent to 50 percent premium over commodity DRAM. Also, the price of these faster systems must cover the added cost of the different logic chips needed to drive the SDRAM.
Another way to speed memory access is by adding an on-chip cache. This approach, often called CDRAM (cached DRAM) or EDRAM (enhanced DRAM), succeeds because it places an SRAM cache on the same chip as the DRAM. CDRAM comes from Mitsubishi (Sunnyvale, CA) and is second-sourced from Samsung, and
EDRAM comes from Ramtron International (Colorado Springs, CO). In both cases, this cache can respond more quickly to requests for the CPU if it has the right information already in the cache.
The chips also gain speed because the caches are able to fetch data from the slow DRAM in large blocks using the internal buses. Mitsubishi's CDRAM, for instance, features a 16-Kb cache with 128-bit lines on both its 4- and 16-Mb chips. When data is requested, the slow DRAM sends the entire 128-bit block to the fast SRAM. If the next address requested is in this block, as often happens, then the chip is ready. Picking the right sizes for the caches and buses is still an art, and practice varies widely. Ramtron, for instance, chooses to use a 2048-bit-wide bus to fill an 8-Kb SRAM cache on its 4-Mb DRAM.
Some systems designers are happy with cached memory chips. Ocean Information Systems (Covina, CA) manufactures 486 and Pentium motherboards that use Ramtron's EDRAM as the main system memory. The cache on th
e chips allows all the memory to operate at cache memory speeds. This makes an enormous difference when the CPU requests information that isn't in the L2 (Level 2) cache--something that happens more frequently with multitasking OSes and bloated programs. Barnett Fischer, Ocean's director of R&D, says, "A 100-MHz Pentium runs at only 8 MHz if it misses the L2 cache." That is why a 33-MHz 486 system using EDRAM can switch among a number of tasks much faster than a 100-MHz Pentium with a standard DRAM. The Pentium will still be substantially faster on single-task benchmarks that don't leap outside the L2 cache, but it will crawl to an 8-MHz halt when the cache starts missing.
RAM Packaging
As EDORAM and faster products begin to permeate the market, they will be seen primarily as 72-pin SIMMs, currently the standard commodity configuration for the PC world, although that may not be the best long-term solution. All RAM manufacturers are investigating faster and denser packaging, inc
luding direct mounting, wafer laminating, and other ways of packing the wafers closer to each other (see "More Memory in Less Space"). Many manufacturers are continuing to examine packages that might offer a more stable and faster bus. For instance, SDRAM requires that the RAM and the CPU share clocking information, and more precise packaging may better serve this need.
One of the best-known alternative formats for RAM is RDRAM (Rambus DRAM) from Rambus (Mountain View, CA). This bundles better and smaller packaging with more-stable lines and faster signaling logic. The chips are close together, and the leads are designed to be short, precise, and manufactured to much tighter tolerances than are standard printed circuit boards. This minimizes the extra capacitance that can cause the signals to travel at unpredictable speeds.
The system is also strongly synchronized to a clock that regulates exactly when the information will be available on the bus. The transfer happens every 2 ns on both the odd
and the even edge of the clock cycle. This synchronization is similar to the process proposed for SDRAM. All these factors combine to enable transfer speeds up to 500 MBps.
VRAM
The commodity market for main memory is forced to be conservative and slow-moving because all additional memory usually comes in a standard package. The designers of video boards, however, are free to use whatever types of memory circuits they like, and as a result, the market is filled with many different approaches. Some use commodity DRAMs, others use more specialized VRAMs, and still others experiment with more exotic combinations like WRAM and 3D RAM.
Whether to use DRAM or the more expensive VRAM to maintain an image on the screen is an old debate. DRAM serves one master--the controller, which is responsible for changing the image on the screen and for collecting the information and sending it off to the video monitor. In a video card using VRAM, the memory chips serve two masters: One main
tains and changes the image, while the other gathers the information for the video monitor. The VRAM is constantly serving both masters, each at a different port.
VRAM may have two ports, but that doesn't mean it's twice as fast or can provide twice as much information. The actual individual memory cells are the same in VRAM and DRAM. There's a limit to how much information can be moved in and out of an array of DRAM cells, because the addressing lines for the rows and columns must be charged and discharged. This total throughput volume is known as
bandwidth
. VRAM's dual ports don't double the total bandwidth; they merely reserve a slice for the circuitry that drives the screen-drawing function. This extra port increases the amount of information that can come and go from a chip, but it doesn't double it.
This segregation has two effects. The VRAM circuits always perform better at higher resolutions with more colors. The extra bandwidth helps in these high-end cases, where a 1280- by 96
0- by 24-bit screen image requires moving over 3 Mb onto the screen every time it is refreshed. But for lower resolutions, there's much less benefit to be gained from added bandwidth. In these low-end video configurations, VRAM's second port is unused much of the time.
On the other hand, DRAM boards have no set restrictions on how the bandwidth is used, and this is precisely why lower-end video-board manufacturers often choose it. Any memory that isn't used to refresh the screen is available for the video controller to use. If a low resolution is chosen for the screen, then the video controller can use the rest of the bandwidth to create complex images on the screen. The downside is that the amount of leftover RAM drops as screen resolution increases. These DRAM boards can display high-resolution images, but manipulating them becomes sluggish.
RAM manufacturers are exploring new RAM chips for video boards in many ways. Caches, synchronization, and latching can all be used to increase the speed o
f the RAM on video boards. EDORAM, CDRAM, SDRAM, and other DRAM enhancements can be substituted on the board quickly and effectively. In addition, the same techniques can enhance dual-ported VRAM, which means we will see CVRAM (cached VRAM), SVRAM (synchronous VRAM), and EDOVRAM. Burst modes will also be increasingly common because video applications move large blocks of memory more often than many regular applications do.
Adding Intelligence to Video
Enhancing VRAM with caching and anticipatory bursting is just the beginning. There are many standard jobs for a video card that can be sped up by adding some extra intelligence to the RAM chip. WRAM technology is a good example of a dual-ported memory that also has added features for graphics. Matrox (Dorval, Quebec, Canada) is one company using WRAM memory in its video boards. (By the way, WRAM was named more for its ability to offer full-motion video than for any ability to speed up a Microsoft Windows operating environment.) Th
e extra intelligence lets the chip do two-color pattern fills and aligned BitBlts at substantially improved speeds. Matrox engineer Dan Wood, responsible for analyzing the performance of memory chips, points out that these extra features let WRAM perform better than VRAM and at a lower cost.
The fast BitBlt is an effect that can be useful for double-buffering fast animation. WRAM can provide this effect only if the start and finish of the Blt are properly aligned. This is because the WRAM achieves the speedup by using its own internal bit bus. The information leaves a memory line and then is written to another line without leaving the chip. This helps accelerate animation, but it won't help with many of the random BitBlts that are needed to open a menu or drag a window across the screen.
Another intriguing solution is 3D RAM, created by Mitsubishi and Sun Microsystems (Mountain View, CA) to improve the performance of 3-D operations. The solution embeds much of the logic for
z
-buffering
into the chip. Normally, a 3-D graphics card will draw a pixel in 3-D by looking up the pixel in the
z
-buffer, which stores the
z
coordinate of the last pixel drawn at this location. If the new
z
coordinate is smaller, then the pixel under consideration is closer to the eye and thus visible. The graphics card will then write this pixel back to memory for eventual display. If the
z
coordinate is greater than this new pixel, the previously drawn pixel would hide it; thus, it is forgotten.
The 3D RAM moves this whole operation onto the memory chip, where it's handled by an on-board ALU. Instead of the video card having to read, think, and write to draw a pixel in 3-D, it just has to write it to the 3D RAM, which decides whether it will be visible.
The ALU on the 3D RAM can also perform several operations, including raster operations, alpha blending, and comparisons. Mitsubishi estimates that a video card with 3D RAM can render about 1.8 million 100-pixel polygons
per second--an amount that they claim is nine times faster than a board equipped with VRAM.
The 3D RAM also makes significant use of cache technology. The basic 3D RAM is a 10-Mb chip with four 2.5-Mb arrays that feed one central ALU, which performs the raster operations on the incoming pixels. There is one L1 cache at the ALU and four L2 caches at each of the four banks.
Nonvolatile Memory
In several ways, the market for portable computers is limited only by the availability of power for these machines. This distinction is not lost on RAM designers, who are exploring the use of nonvolatile flash RAM and ferroelectric RAM for the main memory of these machines.
Intel (Santa Clara, CA) is a big backer of flash memory, a technology that is similar to EEPROMs. The chips remember their data until they are hit by a larger voltage. Intel announced a 2-MB embedded flash-memory chip in late 1994. It hopes the chip will find acceptance among printer manufacturers who often
need local storage for about 8 MB of flexible information on fonts and other display code.
Flash RAM chips have also found homes in portable, digital cameras and other products that need relatively small amounts of data. Some computer manufacturers are using the chips as a flexible BIOS store that you can upgrade if necessary. But greater acceptance is slowed by the relatively high cost of flash RAM.
Other companies are rapidly entering the race to develop ferroelectric RAM chips, which some people are calling the "ultimate memory." Hitachi and Ramtron are joined in one partnership, and Matsushita and Symetrix (Colorado Springs, CO) are working together in another. All are exploring building 256-Kb and 1- and 4-Mb devices for more-widespread use. Some industry observers expect that FRAM may prove to be a replacement for standard DRAM because it does not seem to degrade after a number of write operations. Flash memory wears out, which limits its usefulness to jobs that do not write data that oft
en. If these companies succeed in developing chips that hold a significant density (16-Mb chips have just been announced), then FRAM may start replacing DRAM.
RAM Drives Forward
The RAM industry will continue to expand and flourish in the next decade--in part because it must. RAM is often the slowest part of today's computer systems, so designers will continue to concentrate on developing faster storage. The expanding market also leads to a greater variety of RAM products with different performance characteristics.
The most exotic memory won't just keep data around; it will compute with it as well. The most ambitious plans for memory will continue to emerge from the graphics arena, but it may not be long before high-end workstations begin to exploit smart memory for general purposes. You can now find on desktop machines, for example, many of the innovations that Cray Computer produced in the 1980s. The smart memory that Cray is using for its latest machines may prove to
be another approach that wins over machines in the years ahead. Until that happens, the relatively gentle progression of faster and faster types of DRAM will provide systems designers with plenty of options when they integrate memory.
WHERE TO FIND
AMD Advanced Micro Devices
Sunnyvale, CA
(800) 538-8450
(408) 732-2400
Centennial Technologies
Billerica, MA
(800) 942-0018
(508) 670-0646
fax: (508) 670-9025
Fujitsu Microelectronics, Inc.
San Jose, CA
(800) 626-4686
(408) 432-6333
fax: (408) 894-1706
Hitachi America, Ltd., Semiconductor & I.C. Division
Brisbane, CA
(800) 285-1601, ext. 10
(415) 589-8300
Intel Corp.
Folsom, CA
(800) 548-4725
fax: (916) 356-6110
Matrox
Dorval, Quebec, Canada
(514) 685-7230
fax: (514) 685-2853
Mitsubishi Electronic Device Group
Sunnyvale, CA
(408
) 730-5900
fax: (408) 732-9382
M-Systemsx
Fremont, CA
(408) 654-5820
fax: (408) 654-9084
National Semiconductor Corp.
Santa Clara, CA
(800) 231-6072
(408) 721-5000
fax: (408) 721-7662
NEC Technologies, Inc.
Boxborough, MA
(800) 632-4636
(508) 264-8000
fax: (508) 264-8673
Ocean Information Systems, Inc.
Covina, CA
(818) 339-8888
fax: (818) 859-7668
Rambus
Mountain View, CA
(415) 903-3800
fax: (415) 965-1528
Ramtron International Corp.
Colorado Springs, CO
(800) 545-3726
(719) 481-7000
fax: (719) 488-9095
Raymond Engineering
Middletown, CT
(203) 632-1000
fax: (203) 632-4329
Samsung Semiconductor, Inc.
San Jose, CA
(408) 954-7000
fax: (408) 954-7870
SGS-Thomson Microelectronics, Inc.
Lincoln, MA
(617) 259-0300
fax: (617) 259-4420
Sun Microsystems, Inc.
Mountain View, CA
(800) 821-464
3
(800) 821-4642 (CA)
(415) 960-1300
fax: (415) 969-9131
Symetrix Corp.
Colorado Springs, CO
(719) 594-6145
fax: (719) 598-3437
symetrix@usa.net
Toshiba America Electronic Components
Irvine, CA
(800) 879-4963
AN ABBREVIATED GUIDE TO THE RAMS
CDRAM
cached DRAM
CVRAM
cached VRAM
DRAM
dynamic RAM
EDRAM
enhanced DRAM
EDORAM
extended data out RAM
EDOSRAM
extended data out SRAM
EDOVRAM
extended data out VRAM
FPM
fast page mode
FRAM
ferroelectric RAM
RDRAM
Rambus DRAM
SDRAM
synchronous DRAM
SRAM
static RAM
SVRAM
synchronous VRAM
3D RAM
Matsushita's chip for 3-D video processing
VRAM
video RAM
WRAM
Window RAM
Peter Wayner is a BYTE consulting editor based in Baltimore, Maryland, and the author of Agents Unleashed (AP Professional, 1995). He can be reached on the Internet at
pcw@access.digex.com
or on BIX as "pwayner." His Web home page is http://access.digex.com:/pcw/pcwpage.html.