Archives
 
 
 
  Special
 
 
 
  About Us
 
 
 

Newsletter
Free E-mail Newsletter from BYTE.com

 
    
           
Visit the home page Browse the four-year online archive Download platform-neutral CPU/FPU benchmarks Find information for advertisers, authors, vendors, subscribers Request free information on products written about or advertised in BYTE Submit a press release, or scan recent announcements Talk with BYTE's staff and readers about products and technologies

ArticlesAn Alpha in PC Clothing


February 1996 / Core Technologies / An Alpha in PC Clothing

Digital Equipment's new x86 emulator technology makes an Alpha system a fast x86 clone

Tom Thompson

In the day-to-day skirmishes between the RISC and CISC camps over performance issues, few dispute that Digital Equipment's Alpha RISC processor holds the crown for raw speed. However, speed alone doesn't determine the practicality of a desktop system these days. Instead, software that provides solutions is a major part of the decision process. A port of Windows NT 3.51 -- and over 1200 mainstream Windows applications -- to the Alpha helps sweeten its appeal by allowing it to run familiar programs at RISC speeds.

But while the prospect of running CAD and imaging programs at breakneck speeds is tempting, losing the rest of th e software that handles your day-to-day activities -- that World Wide Web browser, the word processor, a terminal emulator, and the E-mail program -- in the bargain is still too high a price to pay. Simply put, although there's a lot of useful x86-based software out there, cost, development efforts, and other issues mean that these programs aren't going to be ported to the Alpha soon, if at all.

To improve the Alpha's usability as a desktop alternative to Intel processors, Digital decided to provide x86 code support. The company determined that an on-chip solution was too costly in terms of die space and implementation difficulty, as witnessed by AMD with its K5 processor.

With its extensive experience in porting and translating Mips, SPARC, and VAX code to the Alpha, Digital instead opted to write an x86 software emulator. Another reason to employ a software emulator rather than porting code is that the technology can be quickly modified to support changes in W indows NT.

However, Digital added an interesting twist to its emulator technology, called FX!32. Since emulation is always slower than native code, FX!32 quietly performs a binary translation of portions of an x86 program to Alpha code and saves these translations to the hard disk. The end result is that over time, your favorite x86 programs become composed mostly of Alpha code and run much faster. Because of the Alpha's high throughput, these translated programs should run faster than any existing Intel-based system, making an Alpha-based computer an ideal PC clone.

It's in the Launch

To perform its sleight of hand, FX!32 consists of several modules, as shown in the figure "FX!32 Components." The FX Server invokes the Background Optimizer component as necessary. When the system first starts, a Transparency Enabler patches NT's CreateProcess() routine. Since CreateProcess() handles the generation of all child processes in the system, the E nabler thus provides a mechanism by which FX!32 detects the launch of an application process. When one occurs, the patch code examines the file's header to check the processor type. Bits in this header indicate whether the application code runs on an Alpha, Intel, Mips, or PowerPC processor.

Normally, Windows NT gives you a warning message if there's a mismatch between an application's processor type and that of the host. If the type bit indicates the file is an x86 application, a Runtime component (described below) handles the creation of the application process. First, it consults a database file to see if a translated version of the x86 program exists. If so, it starts this translated code.

The first time you launch the x86 program, the patch code hands off the job to the FX!32 Emulator/Runtime component. This component is so named because it not only interprets x86 instructions but also intercepts x86-based NT API calls and routes them to corresponding Alpha-based NT calls.

Code Wrapping

The Runtime portion of the component implements its own NT loader. As it loads the x86 code into memory, the Runtime portion inserts "jackets" that provide an interface to the system's Alpha-based NT calls. It does this by first examining the application file's import section, which lists all the DLLs it requires to operate and all references to API functions in these libraries.

The Runtime modifies these import-table entries to reference jacket code. This code starts with an illegal x86 op code, which invokes the Runtime's exception handler. When the x86 program access-es an NT service, it first pushes the function's parameters onto the stack and calls the function. This triggers the exception handler, which pops the parameters off the emulated x86 stack and then places them in the appropriate registers on the Alpha. Finally, the jacket code calls the native version of the NT function. Function results undergo a similar transformation so that they wind up in the appropriate x8 6 registers, where the x86 program expects them.

The loader also examines the database to see if translated portions of the program or DLLs exist. If so, it loads these into memory as well and sets up a table that consists of address pairs. The first entry is the x86 program's address in memory, and the second is the corresponding memory address for the Alpha code. If no translated code exists, the second entry is empty. As the Emulator component runs, it continuously monitors this table. If it finds a pair of addresses, it uses the second address to jump into native code.

Emulation Strategies

The Emulator component is basically an x86 instruction interpreter with support for code jacketing and translated-code jumps. It has a pipelined dispatch loop that fetches an x86 instruction, decodes it, and, via a lookup table, routes the thread of execution to a native-code block that carries out the requested operation. The pipelined design enables the loop to start the table lookup for the next x86 instruction (recall that x86 instructions are variable-length) as it dispatches the current instruction.

Native-code blocks take two forms, as shown in the figure "Types of Instruction Emulation." The first type executes only the x86 instruction, such as an add to a memory location, that performs the necessary memory accesses and operations and adjusts the state of the x86-condition registers. This type of block executes quickly because it's all in-line code. It's also large, because this code must handle the operation in every detail.

The second type of code block consists of function calls. The first call parses the instruction. This function calls another function that performs the memory accesses. Next it calls an add function, which performs the addition operation, and another function call stores the result. A final call updates the x86-condition registers. This type of code block is smaller than the first type, but it executes slower due to functio n-call overhead.

Digital's engineers are fine-tuning the emulator design so that specific x86 instructions invoke one or the other code-block type. They are also seeking a mix that minimizes cache misses in the Alpha 21064A's 16-KB code cache.

Code Conversion

The Emulator component performs another important task. As it runs, it stores execution profiles that document the flow of the program. These profiles get stored in a database file on the system's hard drive. When the system's activity level falls below a certain point (typically after you stop using an application), the FX Server starts the Optimizer component. This acts like a compiler with a front end that parses x86 instructions instead of source code. It builds an intermediate representation of the program, working on those sections for which it has execution profiles. It also performs some code optimizations, instruction scheduling, register allocation, and dead code removal.

Finally, the Optimizer creates an accurate representation of the program section as Alpha code and saves this image on disk. The next time you start the program, the FX Runtime picks up the translated code. Digital's goal is for such translated programs to run at 70 percent of the speed of native programs.

The Emulator produces profiles only for those portions of the program that actually run, and the Optimizer translates only those parts of the program for which it has a profile. On subsequent application launches, portions of the program that aren't translated wind up in the Emulator, which begins generating profiles on them. Over time, most of the x86 program gets translated, typically in about two or three uses. Because FX!32 adds code to the application files, you can expect an application's disk footprint to at least double in size.

The FX!32 Server maintains a database of the translated code sections for each program. Ordinarily, the Server discards old translated code when a new translation is started and the hard dis k quota is about to be exceeded. The FX!32 user interface lets a user manage this process. For example, the user can raise or lower the disk-quota size and mark x86 programs whose translated code should not be purged.

The initial implementation of FX!32 runs entirely in user mode on Windows 3.51. Some FX!32 code, such as the loader, duplicates functions that are part of NT but are not available to user programs. This isn't desirable, because FX!32 must track changes in the parts of NT that it duplicates. Microsoft is making changes to the Win32 API, which will support emulators like FX!32. As part of Digital's alliance with Microsoft, the two companies are working to ensure that FX!32 works with this new interface.

It's important to note that FX!32 is designed to work with 32-bit Windows programs. Sixteen-bit DOS and Windows programs are handled by the x86 emulator technology in NT provided by Insignia Solutions.

The implications for FX!32 on the computer industry are interesting. Unless a new platform offers a large performance difference, it's not worth the effort and software costs to switch from an existing platform. Since FX!32 is bundled with every Alpha system, you now have a compelling reason to switch: a huge performance difference (even compared to the Pentium Pro) and the ability to host that huge investment in x86 software your business has made. Better still, through FX!32, 32-bit applications can run at near-native speeds.


FX!32 Components

illustration_link (9 Kbytes)

The various FX!32 components. The Emulator/Runtime component handles the launch and operation of an x86 application. The FX Server starts the Optimizer component as a background process when system activity is low, and it maintains a database of execution profiles and translated-code images for each x86 application.


Types of Instruction Emulation

illustration_link (10 Kbytes)

The Emulator implements x86 instructions either as in-line native-code blocks or as a series of function calls. The in-line code executes quickly, but at the expense of size. The function calls use a smaller memory footprint but execute slower because of the function overhead.


Tom Thompson is a BYTE senior technical editor at large with a B.S.E.E. degree from the University of Memphis. You can contact hi m on AppleLink as "T.THOMPSON" or on the Internet or BIX at tom_thompson@bix.com .

Up to the Core Technologies section contentsGo to previous article: Go to next article: What's the Future of Dylan?SearchSend a comment on this articleSubscribe to BYTE or BYTE on CD-ROM  
Flexible C++
Matthew Wilson
My approach to software engineering is far more pragmatic than it is theoretical--and no language better exemplifies this than C++.

more...

BYTE Digest

BYTE Digest editors every month analyze and evaluate the best articles from Information Week, EE Times, Dr. Dobb's Journal, Network Computing, Sys Admin, and dozens of other CMP publications—bringing you critical news and information about wireless communication, computer security, software development, embedded systems, and more!

Find out more

BYTE.com Store

BYTE CD-ROM
NOW, on one CD-ROM, you can instantly access more than 8 years of BYTE.
 
The Best of BYTE Volume 1: Programming Languages
The Best of BYTE
Volume 1: Programming Languages
In this issue of Best of BYTE, we bring together some of the leading programming language designers and implementors...

Copyright © 2005 CMP Media LLC, Privacy Policy, Your California Privacy rights, Terms of Service
Site comments: webmaster@byte.com
SDMG Web Sites: BYTE.com, C/C++ Users Journal, Dr. Dobb's Journal, MSDN Magazine, New Architect, SD Expo, SD Magazine, Sys Admin, The Perl Journal, UnixReview.com, Windows Developer Network