a lot of useful x86-based software out there, cost, development efforts, and other issues mean that these programs aren't going to be ported to the Alpha soon, if at all.
To improve the Alpha's usability as a desktop alternative to Intel processors, Digital decided to provide x86 code support. The company determined that an on-chip solution was too costly in terms of die space and implementation difficulty, as witnessed by AMD with its K5 processor.
With its extensive experience in porting and translating Mips, SPARC, and VAX code to the Alpha, Digital instead opted to write an x86 software emulator. Another reason to employ a software emulator rather than porting code is that the technology can be quickly modified to support changes in W
indows NT.
However, Digital added an interesting twist to its emulator technology, called FX!32. Since emulation is always slower than native code, FX!32 quietly performs a binary translation of portions of an x86 program to Alpha code and saves these translations to the hard disk. The end result is that over time, your favorite x86 programs become composed mostly of Alpha code and run much faster. Because of the Alpha's high throughput, these translated programs should run faster than any existing Intel-based system, making an Alpha-based computer an ideal PC clone.
It's in the Launch
To perform its sleight of hand, FX!32 consists of several modules, as shown in the figure
"FX!32 Components."
The FX Server invokes the Background Optimizer component as necessary. When the system first starts, a Transparency Enabler patches NT's
CreateProcess()
routine. Since
CreateProcess()
handles the generation of all child processes in the system, the E
nabler thus provides a mechanism by which FX!32 detects the launch of an application process. When one occurs, the patch code examines the file's header to check the processor type. Bits in this header indicate whether the application code runs on an Alpha, Intel, Mips, or PowerPC processor.
Normally, Windows NT gives you a warning message if there's a mismatch between an application's processor type and that of the host. If the type bit indicates the file is an x86 application, a Runtime component (described below) handles the creation of the application process. First, it consults a database file to see if a translated version of the x86 program exists. If so, it starts this translated code.
The first time you launch the x86 program, the patch code hands off the job to the FX!32 Emulator/Runtime component. This component is so named because it not only interprets x86 instructions but also intercepts x86-based NT API calls and routes them to corresponding Alpha-based NT calls.
Code
Wrapping
The Runtime portion of the component implements its own NT loader. As it loads the x86 code into memory, the Runtime portion inserts "jackets" that provide an interface to the system's Alpha-based NT calls. It does this by first examining the application file's import section, which lists all the DLLs it requires to operate and all references to API functions in these libraries.
The Runtime modifies these import-table entries to reference jacket code. This code starts with an illegal x86 op code, which invokes the Runtime's exception handler. When the x86 program access-es an NT service, it first pushes the function's parameters onto the stack and calls the function. This triggers the exception handler, which pops the parameters off the emulated x86 stack and then places them in the appropriate registers on the Alpha. Finally, the jacket code calls the native version of the NT function. Function results undergo a similar transformation so that they wind up in the appropriate x8
6 registers, where the x86 program expects them.
The loader also examines the database to see if translated portions of the program or DLLs exist. If so, it loads these into memory as well and sets up a table that consists of address pairs. The first entry is the x86 program's address in memory, and the second is the corresponding memory address for the Alpha code. If no translated code exists, the second entry is empty. As the Emulator component runs, it continuously monitors this table. If it finds a pair of addresses, it uses the second address to jump into native code.
Emulation Strategies
The Emulator component is basically an x86 instruction interpreter with support for code jacketing and translated-code jumps. It has a pipelined dispatch loop that fetches an x86 instruction, decodes it, and, via a lookup table, routes the thread of execution to a native-code block that carries out the requested operation. The pipelined design enables the loop to start the table lookup
for the next x86 instruction (recall that x86 instructions are variable-length) as it dispatches the current instruction.
Native-code blocks take two forms, as shown in the figure
"Types of Instruction Emulation."
The first type executes only the x86 instruction, such as an add to a memory location, that performs the necessary memory accesses and operations and adjusts the state of the x86-condition registers. This type of block executes quickly because it's all in-line code. It's also large, because this code must handle the operation in every detail.
The second type of code block consists of function calls. The first call parses the instruction. This function calls another function that performs the memory accesses. Next it calls an add function, which performs the addition operation, and another function call stores the result. A final call updates the x86-condition registers. This type of code block is smaller than the first type, but it executes slower due to functio
n-call overhead.
Digital's engineers are fine-tuning the emulator design so that specific x86 instructions invoke one or the other code-block type. They are also seeking a mix that minimizes cache misses in the Alpha 21064A's 16-KB code cache.
Code Conversion
The Emulator component performs another important task. As it runs, it stores execution profiles that document the flow of the program. These profiles get stored in a database file on the system's hard drive. When the system's activity level falls below a certain point (typically after you stop using an application), the FX Server starts the Optimizer component. This acts like a compiler with a front end that parses x86 instructions instead of source code. It builds an intermediate representation of the program, working on those sections for which it has execution profiles. It also performs some code optimizations, instruction scheduling, register allocation, and dead code removal.
Finally, the Optimizer creates
an accurate representation of the program section as Alpha code and saves this image on disk. The next time you start the program, the FX Runtime picks up the translated code. Digital's goal is for such translated programs to run at 70 percent of the speed of native programs.
The Emulator produces profiles only for those portions of the program that actually run, and the Optimizer translates only those parts of the program for which it has a profile. On subsequent application launches, portions of the program that aren't translated wind up in the Emulator, which begins generating profiles on them. Over time, most of the x86 program gets translated, typically in about two or three uses. Because FX!32 adds code to the application files, you can expect an application's disk footprint to at least double in size.
The FX!32 Server maintains a database of the translated code sections for each program. Ordinarily, the Server discards old translated code when a new translation is started and the hard dis
k quota is about to be exceeded. The FX!32 user interface lets a user manage this process. For example, the user can raise or lower the disk-quota size and mark x86 programs whose translated code should not be purged.
The initial implementation of FX!32 runs entirely in user mode on Windows 3.51. Some FX!32 code, such as the loader, duplicates functions that are part of NT but are not available to user programs. This isn't desirable, because FX!32 must track changes in the parts of NT that it duplicates. Microsoft is making changes to the Win32 API, which will support emulators like FX!32. As part of Digital's alliance with Microsoft, the two companies are working to ensure that FX!32 works with this new interface.
It's important to note that FX!32 is designed to work with 32-bit Windows programs. Sixteen-bit DOS and Windows programs are handled by the x86 emulator technology in NT provided by Insignia Solutions.
The implications for FX!32 on the computer industry are interesting. Unless
a new platform offers a large performance difference, it's not worth the effort and software costs to switch from an existing platform. Since FX!32 is bundled with every Alpha system, you now have a compelling reason to switch: a huge performance difference (even compared to the Pentium Pro) and the ability to host that huge investment in x86 software your business has made. Better still, through FX!32, 32-bit applications can run at near-native speeds.
illustration_link (9 Kbytes)

The various FX!32 components. The Emulator/Runtime component handles the launch and operation of an x86 application. The FX Server starts the Optimizer component
as a background process when system activity is low, and it maintains a database of execution profiles and translated-code images for each x86 application.
illustration_link (10 Kbytes)

The Emulator implements x86 instructions either as in-line native-code blocks or as a series of function calls. The in-line code executes quickly, but at the expense of size. The function calls use a smaller memory footprint but execute slower because of the function overhead.
Tom Thompson is a BYTE senior technical editor at large with a B.S.E.E. degree from the University of Memphis. You can contact hi
m on AppleLink as "T.THOMPSON" or on the Internet or BIX at
tom_thompson@bix.com
.