Archives
 
 
 
  Special
 
 
 
  About Us
 
 
 

Newsletter
Free E-mail Newsletter from BYTE.com

 
    
           
Visit the home page Browse the four-year online archive Download platform-neutral CPU/FPU benchmarks Find information for advertisers, authors, vendors, subscribers

ArticlesA Sampling of New DSP Designs


March 1998 / International Features / European ASIC Designs / A Sampling of New DSP Designs
Nebojsa Novakovic

Recent conferences have exhibited a range of new digital signal processor architectures, with innovations such as highly parallel processing in the BOPS ManArray, dramatic FP performance improvements in new TI processors, and CPU+DSP combinations, like the StrongARM SA-1500, for next-generation multimedia applications.

ManArray: DSP Supercomputing

The team that developed the M-FAST DSP at IBM formed Hot Chip s & Salsas -- now known as BOPS, Inc. -- and recently announced its first product architecture, an array computing topology known as ManArray.

ManArray is based on a manifold array of integer/FP DSP processing elements (PEs), each capable of up to 3.2 billion operations per second (BOPS) at 100 MHz. The 3.2-BOPS figure was calculated based on a maximum execution rate on 8-bit single instruction/multiple data (SIMD) integer operations. Four PEs, organized as a 2x2 matrix in the basic ManArray building block, are controlled by a single sequence processor (SP); besides including all PE features, the SP also provides address and control functions. Each PE executes compact 32-bit encapsulated VLIW (EVLIW) instructions. This allows up to five 32-bit operations to be sent from the SPs to the PEs for parallel execution in every cycle by a single 32-bit instruction. EVLIW instructions provide operations on packed data types, as well as loads and stores from local PE memory. Every PE and SP can execute up to five instructions (multiply, add, load, store, data select) in a single cycle.

The flexibility brought by ManArray will be interesting for a number of DSP as well as general-purpose applications. BOPS has decided to offer licensing of the ManArray architecture and instruction set to companies involved in 3-D graphics, video, communications, and process control. The basic 2x2 core chip, known as Kittyhawk, is expected in mid-1998. According to BOPS, it should be the first programmable MPEG-2 codec implementation at prices comparable to fixed-function devices from vendors such as C-Cube.

TI Goes FP Parallel

Texas Instruments has used its expertise from the 1600-MIPS TMS 320C62x fixed-point DSP family to develop its superset TMS 320C67x generation of floating-point 32-bit DSPs. The new family starts at the 1-GFLOPS performance level, a speed enviable for inexpensive DSP chips, and is expected to grow to 3 GFLOPS in the next two years, according to TI. The first samples are expected in the s econd half of '98.

StrongARM with a DSP Twist

While it may be already transferred to Intel by the time you read this, Digital's StrongARM processor family remains the top contender for network computers and high-powered palmtops with functions such as voice and handwriting recognition. The new 300-MHz SA-1500, combining a fast StrongARM CPU and a dedicated Attached Media Processor (AMP) DSP, could perhaps provide the best of both worlds.

While the central execution unit on the chip is basically the standard StrongARM CPU running at a higher speed than before and with large 2 x 16-KB instruction and data caches, the AMP can do dual tasks. First of all, it can work either as a tightly coupled coprocessor, where either one ARM or one AMP instruction can be launched every cycle, or it can run in a parallel execution mode, where it operates in parallel with ARM. In that mode, AMP instructions are fetched from a 4-KB 64-bit writable control store.

AMP has 64 36-bit registers with support for both integer and FP data types. It provides the usual set of DSP instructions with single-cycle throughput. Using FP multiply-accumulate instructions, the maximum FP throughput for the SA-1500 is 600 MFLOPS. Looking at SIMD integer operations, the maximum throughput for the SA-1500 with both ARM and AMP operating in parallel is a respectable 3.6 BOPS.

To ensure sustained high performance, the SA-1500 has a 64-bit 100-MHz SDRAM main memory bus with 800-MBps bandwidth, as well as a separate 50-MHz general I/O bus with a 15-channel DMA controller. Combined with an I20-compliant PCI-PCI bridge, such as the DEC 21554, this chip could form the base of a high-performance combined I/O and multimedia coprocessor, able to accelerate I/O tasks such as XOR-data generation for RAID arrays and multimedia tasks such as real-time DVD playback.


Up to the International Features section contentsGo to previous article: A Sampling of New DSP DesignsGo to next article: Webcasting Over the Air
Flexible C++
Matthew Wilson
My approach to software engineering is far more pragmatic than it is theoretical--and no language better exemplifies this than C++.

more...

BYTE Digest

BYTE Digest editors every month analyze and evaluate the best articles from Information Week, EE Times, Dr. Dobb's Journal, Network Computing, Sys Admin, and dozens of other CMP publications—bringing you critical news and information about wireless communication, computer security, software development, embedded systems, and more!

Find out more

BYTE.com Store

BYTE CD-ROM
NOW, on one CD-ROM, you can instantly access more than 8 years of BYTE.
 
The Best of BYTE Volume 1: Programming Languages
The Best of BYTE
Volume 1: Programming Languages
In this issue of Best of BYTE, we bring together some of the leading programming language designers and implementors...

Copyright © 2005 CMP Media LLC, Privacy Policy, Your California Privacy rights, Terms of Service
Site comments: webmaster@byte.com
SDMG Web Sites: BYTE.com, C/C++ Users Journal, Dr. Dobb's Journal, MSDN Magazine, New Architect, SD Expo, SD Magazine, Sys Admin, The Perl Journal, UnixReview.com, Windows Developer Network