Archives
 
 
 
  Special
 
 
 
  About Us
 
 
 

Newsletter
Free E-mail Newsletter from BYTE.com

 
    
           
Visit the home page Browse the four-year online archive Download platform-neutral CPU/FPU benchmarks Find information for advertisers, authors, vendors, subscribers Request free information on products written about or advertised in BYTE Submit a press release, or scan recent announcements Talk with BYTE's staff and readers about products and technologies

ArticlesMMX Accelerates the x86


October 1996 / Cover Story / 3-D for Everyone / MMX Accelerates the x86

To improve multimedia and 3-D processing, Intel's new MMX technology can pack multiple pixels into one register and manipulate them with a single instruction. In effect, MMX brings a new level of parallelism to x86 processors.

Instead of adding new physical registers to the x86 architecture (which would slow MMX's adoption), Intel reuses the existing floating-point (FP) stack as logical MMX registers. MMX instructions use only the 64-bit mantissa portion of the 80-bit FP registers, ign oring the 16-bit exponent portion. This yields eight 64-bit logical registers without significantly altering the x86 architecture.

MMX instructions can pack several data types into these 64-bit regis ters: packed bytes (eight per register), packed words (four per register), packed doublewords (two per register), and a quadword (one 64-bit value per register). These data types are useful because multimedia programs typically work on small units of data. For example, a color pixel in TrueColor mode, the highest commonly used color resolution, uses 24 bits: 1 byte for each RGB color. This mode allows up to 16.7 million colors, more than the human eye can discern. In HiColor mode, only 16 bits are needed for a pixel. For many graphics applications, 16 bits is more than enough.

New x86 processors that support MMX will address the new registers as MM0 through MM7. Instead of treating the registers as a stack -- as FP instructions do -- MMX instructions can access the registers directly. When switching back and forth between FP and MMX instructions, the existing FSAV instruction saves the state of the registers, and the usual FRSTR instruction restores the values. This keeps MMX technology compatible with ex isting OSes, which frequently must save and restore the registers when context-switching between multitasking applications.

The downside is that programmers can't mix FP and MMX instructions together because they need the same registers. But this is not as significant as it sounds, since multimedia programs typically perform their FP operations before displaying the data. (Rendering relies more heavily on integer instructions.)

MMX introduces a set of general-purpose integer instructions that use the single instruction/multiple data (SIMD) paradigm. One instruction processes the multiple data in the packed registers. This parallelism increases performance . Incidentally, this concept is not new at Intel. Years ago, the now-obsolete i860 RISC family featured a similar technology, called Pixel Addressing Extension (PAX).

Another feature of the new instruction set, parallel-compare operations, could improve performance by eliminating branches. (Modern processors try to predict bra nches, but a misprediction means a penalty of several processor cycles.) Combined with packed data features, parallel-compare operations are useful when, for example, you want to combine or overlay two images.

The MMX instructions are similar to those in Sun's Visual Instruction Set (VIS) for the UltraSparc. VIS also packs registers and uses the FP registers. But it has a lot more to offer than MMX: 32 new registers (compared to Intel's eight), accelerated video decompression with discrete cosine transformations, more-powerful addressing modes, pixel masking, and a highly specialized set of operations that greatly accelerates motion estimation when compressing MPEG video streams.

MMX isn't Intel's only new approach to accelerating 3-D. Another new extension for 3-D accelerators is the Advanced Graphics Port (AGP). To evenly distribute main-processor tasks and graphics-chip tasks, the AGP creates a new data path for data transfers between main memory and the graphics card's frame buffer. By skipping th e PCI bus altogether, AGP can theoretically allow read and write transfers at speeds up to 400 MBps, according to Intel.


Parallelism Speeds Performance

illustration_link (32 Kbytes)


Up to the Cover Story section contentsGo to previous article: MMX Accelerates the x86Go to next article: 3-D GlossarySearchSend a comment on this articleSubscribe to BYTE or BYTE on CD-ROM  
Flexible C++
Matthew Wilson
My approach to software engineering is far more pragmatic than it is theoretical--and no language better exemplifies this than C++.

more...

BYTE Digest

BYTE Digest editors every month analyze and evaluate the best articles from Information Week, EE Times, Dr. Dobb's Journal, Network Computing, Sys Admin, and dozens of other CMP publications—bringing you critical news and information about wireless communication, computer security, software development, embedded systems, and more!

Find out more

BYTE.com Store

BYTE CD-ROM
NOW, on one CD-ROM, you can instantly access more than 8 years of BYTE.
 
The Best of BYTE Volume 1: Programming Languages
The Best of BYTE
Volume 1: Programming Languages
In this issue of Best of BYTE, we bring together some of the leading programming language designers and implementors...

Copyright © 2005 CMP Media LLC, Privacy Policy, Your California Privacy rights, Terms of Service
Site comments: webmaster@byte.com
SDMG Web Sites: BYTE.com, C/C++ Users Journal, Dr. Dobb's Journal, MSDN Magazine, New Architect, SD Expo, SD Magazine, Sys Admin, The Perl Journal, UnixReview.com, Windows Developer Network