Archives
 
 
 
  Special
 
 
 
  About Us
 
 
 

Newsletter
Free E-mail Newsletter from BYTE.com

 
    
           
Visit the home page Browse the four-year online archive Download platform-neutral CPU/FPU benchmarks Find information for advertisers, authors, vendors, subscribers Request free information on products written about or advertised in BYTE Submit a press release, or scan recent announcements Talk with BYTE's staff and readers about products and technologies

ArticlesPower Mac Code Optimizations


November 1994 / Core Technologies / Power Mac Code Optimizations

Understanding this RISC system's run-time architecture will let you write faster native applications

Tom Thompson

Apple's move to RISC computing is embodied by its new line of PowerPC-based Macintosh computers, the Power Macintoshes. These computers have been fine-tuned to obtain the best performance out of the PowerPC RISC processor, while also being able to run the existing Mac software base, written as CISC code for the 680x0 processor (from now on, I'll use the term 68K to denote this processor). The Power Mac has a robust 68LC040 emulator that not only executes such 68K applications but also handles with aplomb most Control Panels and Extension software that heavily patch the operating system.

This ability to u se the existing base of 68K software was crucial to the Power Mac's survival in two ways. First, not many folks would buy a faster computer if it meant scrapping the huge investments made in 68K applications software. More important, the Power Mac OS requires the emulator to execute portions of its own operating system. Approximately 85 percent of the ROM code responsible for implementing the Mac API--collectively known as the Mac Toolbox--is still 68K code, drawn from the ROMs of 68040-based Quadra systems. To improve system performance, portions of the operating system (e.g., QuickDraw, the Font Manager, the Memory Manager, the numeric libraries, and parts of the Resource Manager) were written in ``native'' (i.e., PowerPC) code. Making these key sections of the Mac OS native boosted system performance where it was needed. A special Mixed Mode Manager transparently mediates the processor context switch between the emulated 68K processor environment and the PowerPC processor environment. The result is a RISC computer that not only hosts 68K applications and Extensions but is stable, is reasonably fast, and was shipped in early 1994.

The emulator musters decent performance in most situations, on average about that of a 25-MHz 68040. By ``going native,'' or recompiling the application so that it exists as PowerPC code, better performance is possible because the application spends less time in the emulator. However, the emulator can't be kept out of the picture completely, because of the 68K Toolbox code. As the thread of execution hops from emulator to PowerPC and back, the overhead of the Mixed Mode Manager's context switches degrades performance. However, although this overhead can't be eliminated, it can be reduced by a careful understanding of what's occurring at any given moment in the Mac OS. What I'll cover here are the performance pitfalls you might encounter because of the Mac OS's unique nature.

Improving Math

With regard to floating-point performance, the PowerPC processor excels beca use it has a built-in FPU that supports both single- and double-precision IEEE 754 floating-point computations. If your Mac application makes heavy use of floating-point math, you'll want to take advantage of this PowerPC feature. It involves a few changes to your application code, which I'll explain in a moment. The reason for the change lies in the original Toolbox support for this type of calculation. Floating-point calculations and trigonometric functions were supported via an API known as SANE (Standard Apple Numeric Environment). SANE either communicates to a math coprocessor if the Mac has one or calls the appropriate math library if it doesn't. Programmers simply declared floating-point variables to be extended and made calls to SANE without worrying about the hardware.

Apple implemented SANE as native code on Power Macs to support those 68K applications that use it. However, the SANE interface uses the same 68K trap mechanism as the other Toolbox calls, so the Mixed Mode Manager exacts a perfo rmance hit. On a Power Mac 8100/80, SANE cranks out the performance of a 25-MHz 68040. This is adequate performance for emulated 68K programs, but native applications obtain far better performance by directly using the PowerPC processor's floating-point instructions. You do this by calling the floating-point functions found in the header file fp.h. These functions follow the FPCE (Floating-Point C Extension) specification, which defines support for IEEE 754/854 floating-point calculations. As a developing standard, the use of these function calls should ensure cross-platform compatibility at a future date. With these functions, your program calls the PowerPC math instructions in-line, rather than taking a trip to the Mixed Mode Manager, perhaps into the emulator, and back. It's important to note that PowerPC's single-precision floating-point values are 32 bits in size (for type float), and double-precision values are 64 bits (for type double). These are smaller than the 80- and 96-bit values used by SANE's ex tended variable type. If necessary, you might have to modify your program code to compensate for this loss in precision.

Data Alignment

The PowerPC processor handles memory accesses differently than the 68K processor. The 68K processor readily accesses data of any size (byte, word, and longword) at even memory addresses. However, accessing anything larger than a byte at an odd address on a 68K processor produces the much-beloved address-error bomb box. The 68K compilers typically add padding bytes at certain points in a program's data structures to ensure that the data remains aligned on even addresses (i.e., word-aligned).

The PowerPC processor favors memory accesses that correspond to the data's size: It accesses bytes at every address, words (16 bits) at every even address, and longwords (32 bits) at every address divisible by four. The PowerPC handles memory accesses at any address (thus, no bus errors), but doing so requires extra bus cycles. These extra bus cycles result in a perform ance hit each time unaligned data accesses are made. Similar to 68K compilers, PowerPC compilers pad a program's data structures to achieve an alignment that minimizes the bus cycles required to access the data.

However, data that's optimally aligned for the PowerPC might result in data that's not word-aligned for the 68K processor. This creates a serious problem when you're working with data that gets handed to an emulated Toolbox call. If you're creating data structures that you expect a 68K-based Mac to use, whether the data is read from a file or passed through a network, it had better be word-aligned or you'll crash these systems. Unless you're writing a special high-performance Power Mac application, the numbers require that you take care of your 68K Mac users.

Now that you're aware of the dangers and promise of data alignment, what do you do about it? First, most compilers provide a #pragma statement so that you can indicate what type of data alignment the compiler should use. If you snoo p around in the Mac Toolbox header files, you'll see that those data structures used by the Toolbox are bracketed with the statements #pragma options align=mac68k and #pragma options align=reset. The first directive enforces 68K word alignment for the data structure, and the second lets the compiler resume the use of the preferred PowerPC data alignment for your program. The Mac header files quietly perform this sleight of hand for any of the Toolbox calls you use.

The general rule that emerges is this: If you're declaring data destined for the Toolbox or for a program on a 68K Mac, then word-align the data by using the #pragma directives. If the data is used exclusively by your application, then let the compiler arrange the optimal data alignment for the PowerPC processor.

Reduce Mode Switches

As mentioned earlier, the Mixed Mode Manager helps 68K code and PowerPC code to coexist and execute on a Power Mac. The penalty for this versatility is a performance hit, as the Mixed Mode Manager m assages the stack to set up and tear down the context-switch frames between two disparate processor environments. Again, these switches are unavoidable because most of the Toolbox code is emulated, and applications make heavy use of these calls. However, it is possible to reduce the frequency of Toolbox calls at certain points in the application to minimize the effect of the Mixed Mode Manager.

For example, suppose the file I/O function in your program is reading and decompressing data. You want your program to be a good, cooperative, multitasking neighbor, so in this function, you have a Toolbox call that reads a buffer of data, calls a function that processes the data, and then calls the WaitNextEvent() Toolbox routine so that processor time is given to other applications (see the listing ``Polling the Event Manager''). When the program is compiled as a native program, an unexpected result occurs. As native code, the decompression function runs much faster, so WaitNextEvent() is called more frequentl y. Unfortunately, the WaitNextEvent() routine (as well as any other Event Manager routine) is emulated Toolbox code. Therefore, as the I/O loop runs faster, the Mixed Mode Manager intervenes more often. The result is that the performance gain achieved by the native code can be whittled away by the overhead of two processor context switches each time WaitNextEvent() gets called and returns.

The solution is to not call WaitNextEvent() or other Toolbox routines indiscriminately within a processing loop. Instead, call the accessor function LMGetTicks(), which returns elapsed system time as ticks, where ticks represents an interval of one-sixtieth of a second. Use the ticks value and a little code to compute a time interval that determines when to call WaitNextEvent() (see the listing ``Timed Event Manager Call''). In summary, don't poll the Event Manager routines when doing lengthy processing. Instead, make the Toolbox call only after a specific amount of time has passed. This way, the processing loop rema ins in native code and performs more work during the fixed time interval, yet WaitNextEvent() gets called just often enough so that other programs can run smoothly. This arrangement keeps the Mixed Mode Manager overhead at a fixed level, even when the code runs on Power Macs with faster (100 MHz or more) and more- powerful processors (such as the PowerPC 604).

This is only a partial list of what you can do to avoid unexpected performance hits when you retool your application as native code. Keep in mind that Apple will implement more of the Toolbox as native code, which will improve system performance and make your job easier.


Listing: Polling the Event Manager--A Sample Listing



while (FSRead(input, &amount, &buffer) == noErr)
        {
        Decompress_Data(&buffer, &pointerToMem, expandedBytes);
        pointerToMem =+ expandedBytes;
        WaitNextEvent(everyEvent, &gmyEvent, SHORT_NAP, NO_CURSOR);
        } /* end while */




Listing: Time d Event Manager Call--A Sample Listing



/* Modify for desired time interval */
#define NEXT_EVENT_CHECK_INTERVAL       10;


startTime = LMGetTicks();       /* Get starting ticks value */
while (FSRead(input, &amount, &buffer) == noErr)
        {
        Decompress_Data(&buffer, &pointerToMem, expandedBytes);
        pointerToMem =+ expandedBytes;
        nextTime = LMGetTicks();        /* Get ticks again */
        if ((nextTime - startTime) > NEXT_EVENT_CHECK_INTERVAL)
                {
                WaitNextEvent(everyEvent, &gmyEvent, SHORT_NAP, NO_CURSOR);
                startTime = nextTime;   /* Set up start of new interval */
                } /* end if */
        } /* end while */


Tom Thompson is a BYTE senior technical editor at large. He is an Associate Apple Developer and author of Power Macintosh Programming Starter Kit (Hayden Books, 1994). You can contact him on AppleLink as T.THOMPSON or on the Internet or Bix at tom_thompson@bix.com .

Up to the Core Technologies section contentsGo to previous article: Whither NextStep?Go to next article: Running the Frame-Relay RaceSearchSend a comment on this articleSubscribe to BYTE or BYTE on CD-ROM  
Flexible C++
Matthew Wilson
My approach to software engineering is far more pragmatic than it is theoretical--and no language better exemplifies this than C++.

more...

BYTE Digest

BYTE Digest editors every month analyze and evaluate the best articles from Information Week, EE Times, Dr. Dobb's Journal, Network Computing, Sys Admin, and dozens of other CMP publications—bringing you critical news and information about wireless communication, computer security, software development, embedded systems, and more!

Find out more

BYTE.com Store

BYTE CD-ROM
NOW, on one CD-ROM, you can instantly access more than 8 years of BYTE.
 
The Best of BYTE Volume 1: Programming Languages
The Best of BYTE
Volume 1: Programming Languages
In this issue of Best of BYTE, we bring together some of the leading programming language designers and implementors...

Copyright © 2005 CMP Media LLC, Privacy Policy, Your California Privacy rights, Terms of Service
Site comments: webmaster@byte.com
SDMG Web Sites: BYTE.com, C/C++ Users Journal, Dr. Dobb's Journal, MSDN Magazine, New Architect, SD Expo, SD Magazine, Sys Admin, The Perl Journal, UnixReview.com, Windows Developer Network