A VLIW processor like the generic one illustrated above should execute eight operations
per cycle on most cycles--with a 200-MHz clock it would be 50 to 100 percent faster than current superscalar chips. Unfortunately, such performance requires the compiler to know intimate hardware details, like the latency of each function unit.
A:
Adding extra function units can increase performance (by reducing resource conflicts), with little effect on overall complexity. However, physical limits restrict such expansion: limited read and write ports onto the register file (which requires simultaneous access from all function units), and interconnections that rise geometrically with the number of function units. Also, the compiler must find enough parallelism in the program to warrant any extra units.
B:
This hypothetical 256-bit-wide instruction word has eight operation fields, each one a traditional three-operand RISC-like instruction:
. In practice, extra bits may hold immediate values. Each operation fi
eld can directly drive a specific function unit with minimal decoding.
Flexible C++
Matthew Wilson
My approach to software engineering is far more pragmatic than it
is
theoretical--and no language better exemplifies this than C++.
BYTE Digest editors every month analyze and evaluate the best articles from Information Week, EE Times, Dr. Dobb's Journal, Network Computing, Sys Admin,
and dozens of other CMP publications—bringing
you critical news and information about wireless communication,
computer security, software development, embedded systems,
and more!