ntegral to the x86 architecture as the extended 32-bit instructions that Intel added to the 386 more than a decade ago.
Other x86 vendors are adopting MMX, too. Thanks to a recent cross-licensing agreement, future x86 processors from Advanced Micro Devices and AMD's subsidiary, NexGen, should be compatible with Intel's P55C. Cyrix, another x86 vendor, has not yet licensed MMX. However, Cyrix maintains that its future CPUs will be fully compatible with MMX, either by licensing or reverse-engineering the Intzel technology.
Numerous software companies have announced support for MMX in upcoming versions of their products. These include key development tools such as Microsoft Visual C++, Watcom C++, Macromedia Director, and Criterion RenderWare. Microsoft says it will support MMX in its new Direct3D and ActiveMovie APIs.
Inside MMX
Adding new instructions to a microprocess
or is easy: Define the new opcodes and add the necessary logic. But adding new instructions without disrupting software compatibility is another matter. It's a particular challenge with the x86 because backward compatibility isn't just advisable; it's mandatory.
Intel mapped the eight new MMX registers into the existing stack of floating-point (FP) registers. There are eight general-purpose FP registers in an x86 FPU, and each one is 80 bits wide. FP values use 64 bits for the mantissa and 16 bits for the exponent. MMX instructions use those 80-bit registers as a random-access file (not a push-pull stack) of eight 64-bit registers. In other words, MMX instructions use only the 64-bit mantissa portion of an FP register to store MMX operands.
This trick gives programmers the virtual equivalent of eight new registers without radically altering the standard x86 architecture. OS vendors don't have to modify their code to save the state of MMX registers during context switches--MMX registers look like
ordinary FP registers to the OS. Clever, eh?
But there's a catch. Programmers can use MMX and FP instructions in the same program, but they'd better not mix them because both kinds of instructions need the same registers. When a program finishes a sequence of MMX instructions, it must clear the registers with a new instruction (EMMS: Empty MMX state) to make way for subsequent FP instructions. FP instructions do likewise when they pop values off the FP stack and set the registers' tag bits. If a program mixes FP and MMX instructions, it will pay a performance penalty for these register-level "context switches."
Generally, though, it shouldn't be a problem. Developers of multimedia should segregate MMX instructions in a subroutine or library that's called only after probing the chip's CPU_ID to verify that it supports MMX. It makes sense to group MMX instructions into tight routines, anyway, because multimedia processing typically involves repetitive operations on long sequences of data.
Packed Operands
Even though MMX instructions use FP registers, they're all integer-type instructions. Their 64-bit operands may contain eight packed bytes, four packed 16-bit words, two packed 32-bit doublewords, or a single 64-bit quadword.
Potentially, an MMX instruction could manipulate an 80-bit packed operand if it used a whole FP register. But Intel limited the operands to 64 bits because they match the Pentium's 64-bit I/O bus and internal data paths. Also, 80 isn't an even power of 2 in binary, so it's more troublesome to handle.
As it is, the 64-bit operands are plenty long enough for typical multimedia jobs. Suppose a program is manipulating graphics in 8-bit color, which is often the case in games. An MMX instruction can pack eight pixels into a single operand and process them all at once. An ordinary x86 CPU can shuffle only one pixel at a time. Audio and communications programs often use 16-bit data types, so a single MMX instruction can process four of those
values in a single chunk.
Most MMX instructions follow this pattern of performing a single operation on a series of integer values. This technique is called single instruction, multiple data (SIMD), and it lends itself to the algorithms and data types frequently found in multimedia software. Examples include MPEG compression, wavelet compression, motion compensation, motion estimation, color space conversion, texture mapping, 2-D filtering, matrix multiplication, fast Fourier transforms, discrete cosine transforms, and phoneme matching.
Something else these processes have in common is a lot of potential parallelism. It's no coincidence that MMX instructions are integer operations; they're designed to exploit these characteristics. Like most other integer operations in a modern x86, the majority of MMX instructions can execute in a single cycle. MMX multiplication instructions require three cycles to execute, but the CPU can issue a new one every cycle.
Therefore, a superscalar CPU like th
e Pentium can execute multiple streams of MMX instructions in its parallel integer pipelines. An out-of-order CPU like the Pentium Pro can rearrange MMX instructions for maximum efficiency. The CPU doesn't need a special multimedia execution unit for MMX, so any advances that improve integer performance will benefit MMX performance as well.
One thing you
won't find
in the MMX instruction set is branch instructions. Branches would disrupt the instruction flow, and mispredicted branches would stall the pipelines--a particular hazard in the superpipelined Pentium Pro. Instead, there are new conditional-select instructions that perform logical operations on multiple operands. By using masks and bitwise comparisons, these instructions can achieve the same results as branches without the delays.
On balance, it appears that Intel has achieved its goal of updating the x86 to meet the demands of modern software without jeopardizing compatibility. Intel could have squeezed out more p
erformance by making more radical changes--for example, by adding new MMX-specific registers instead of aliasing the FP stack--but such changes would slow down the adoption of MMX. The last time Intel extensively revised the x86 architecture was 11 years ago, and most PC users are only now making the transition to 32-bit software. Intel wants MMX to catch on a little faster.
Where to Find
Intel
Santa Clara, CA
Phone: (408) 765-8080
Internet:
http://www.intel.com