Archives
 
 
 
  Special
 
 
 
  About Us
 
 
 

Newsletter
Free E-mail Newsletter from BYTE.com

 
    
           
Visit the home page Browse the four-year online archive Download platform-neutral CPU/FPU benchmarks Find information for advertisers, authors, vendors, subscribers Request free information on products written about or advertised in BYTE Submit a press release, or scan recent announcements Talk with BYTE's staff and readers about products and technologies

ArticlesYou Said What?


October 1996 / International Features / You Said What?

Machine translation tools are far from perfect, but they can save you serious money.

Peter Jaapaniemi and Peter Hofland

Think globally, act locally. That's a mantra international organizations chant constantly. But at a price. In large corporations, the total costs of translating things such as marketing materials and documentation can add up to several million dollars a year. IBM, for example, spent more than $100 million last year to translate manuals from English into 25 other languages.

The problem is especially acute for European corporations, which generally provide documents in at least three or four of the European Union's nine official languages. What these companies are looking for are sophisticated, multilingual translation tools that can help reduce thes e costs.

Machine translation (MT), the use of computers to automate translation, is one of the computer industry's oldest areas of interest -- and one of its most frustrating. Computer scientists and linguists have been working on MT techniques for decades. The results are disappointing: Perfect translation without human intervention is still a dream that will not be realized for the next 10 years or so.

MT has come a long way from simple word-to-word translation. Products can now deduce subtle contextual differences in languages. But even the most sophisticated systems on the market are far from automatic. They are, however, useful support tools for professional human translators. Some systems are based on huge dictionaries that translators use to more efficiently look up words and phrases. Other systems take a first look at a document and produce a rough draft that is then edited by a human translator. The best of these tools can deliver about 80 percent accuracy, experts reckon.

Real MT breakthroughs are rare and developments are slow because of the complexity and continuous progression of language. To be truly effective, a translation system must take into account the formation and use of words, syntax, and semantics. Furthermore, it must be able to recognize colloquial phrases, acronyms, and contractions -- not to mention incorrect grammar and misspelled words. That's why many vendors of language processing tools are now changing strategy. Rather than trying to keep up with constant additions to their products, they are making them customizable, enabling users to update dictionaries and also extend context sensitivity.

Globalink's new generation of technology, dubbed Barcelona, allows translators -- rather than programmers -- to include their own rules of how words should translate in a certain context. That means the front-line translator can efficiently deal with new word fields and idioms that might occur in a specific project. Globalink says this new technology will soon incorporate a "wizard" that allows nonexperts to implement rules in a comprehensive high-level language. Other Windows programs will be able to access the Barcelona translation service via OLE Automation or an API.

Another example of a tool that lets users fine-tune the translation process with an expandable context-sensitive dictionary is Logos' Semantha. According to Mark Andrews, a Logos product marketing manager, this new generation of customizable translation tools enables users who systematically track new phrases and idioms to get to a point where they can push a button and generate close-enough translations for internal company documents and other communications.

So how do you keep up with ever-changing contexts and the variety of technical terms that occur in new projects? One answer comes from the Rank Xerox Research Center (RXRC) in Grenoble, France. RXRC's Terminology Extraction Project aims at facilitating the building of dictionaries. It compares translated documents in both the original and the target language and aligns the text sentence by sentence. Then it extracts the multiword expressions and produces a list of paired terms that can be incorporated in a dictionary. In other words, the system automatically detects multiword expressions in the original and the translated documents and puts them in a dictionary. The technology currently works in Dutch, English, German, French, Italian, Spanish, and Portuguese.

A dictionary of 20,000 terms can take 1000 hours to build and cost up to $600,000, RXRC researchers say. With well-bred extraction tools, such costs can to a large extent be eliminated, they say.

The Terminology Extraction Project is part of the Xerox Lexical Development Architecture (XeLDA). This translation framework includes tools that can detect phrases. For example, if you click on the word "sweep" in the phrase "to sweep it under the rug," you don't get the translation of "to sweep,"; you get the translation of the complete phrase. This happens even if the idiom is spl it up, as in "to sweep that crime under the nearest rug," because the system is designed to detect basic idioms.

Although the XeLDA services are prototypes, RXRC is planning to make these kinds of services commercially available in corporate LAN environments or over the Internet. "In the long term," says Monica Beltrametti, director of the Grenoble RXRC, "we aim to provide our Translation Aid Network Services as general-purpose translation tools to any networked computer user faced with multiple languages at work."

Personal MT

The market for MT tools has traditionally been professional translators in large corporations, international organizations, or governments. However, a new, more casual market for translation tools is emerging. Much of the information being passed around the globe doesn't require the precise translation that a novel or a technical manual might. "There is an increasing need for quick multilingual information scanning," says Ann-Marie Derouault, IBM's worldwide speech and translation marketing executive. "No one would pay a professional translator to translate an e-mail message because a quick translation that gives you a rough idea of its content is all that's required."

New products in this area are nevertheless context-aware , and some also use sophisticated syntactical analysis. They integrate with standard word processors and are priced at less than DM 500. Here are some examples:

  • IBM Europe now offers Windows and OS/2 versions of its host-based translation technology, Personal Translator. This program comes in a basic package with a vocabulary of 160,000 words and 440,000 phrases and in an advanced version with approximately 200,000 words and 550,000 phrases. In Italy this technology is used in Synthema's PeTra English/Italian translation product, which runs under OS/2. And in Germany, IBM has worked with v.Rheinbaben & Busch Electronic Publishing to create a Windows-based German/English version of the Personal Translator techn ology.
  • Accent Software offers Accent Duo with Translation, which integrates translation with word processing capabilities. This Windows system is available in English to Spanish, German, French, or Italian versions (it works bidirectionally). The program features a spelling checker and a thesaurus in both languages and lets users translate documents automatically or work interactively.
  • Logos' Remote Client is a Windows application that lets users dial into a Unix-based translation server. You can choose a multitude of dictionaries for several subjects, then send the job to the server, which returns a translated version. Users can maintain their own translation server or call the Logos corporate server, which costs $.04 per translated word. Logos' goal is to make machine translation available to smaller businesses and freelance translators who can't afford a high-end system.

    Another force driving translation technology is on-line chat and communication in newsgroups. CompuServe, for example, of fers English/French and English/German translation in some of its help forums. These translations are often very meager, but their value is immediacy, because in the context of a support forum messages can lose their relevance if they are delayed. As a CompuServe manager puts it, "The purpose is to quickly provide translations that otherwise would take hours to understand."

    Multilingual translation is also reaching the World Wide Web. Globalink, for example, provides an add-on to Netscape Navigator 2.0 that translates Web sites in Spanish, French, or German into English, and vice versa, at the click of a button. Called Web Translator, the software allows users to translate on-line, or to save pages to be translated off-line, while maintaining the original page's hot links, graphics, and formatting.

    The development of multilingual translation tools is key for most companies. Many of these systems support at least three of the main European languages -- English, French, German, and Spanish. However, the re is no such thing as a one-size-fits-many translation technology. The experience of developing a translation system that works from language A to language B is in most cases of little help when developing a system for languages C and D. Merely replacing dictionaries is not enough because it does not reflect the grammatical structure or different semantic classes of words.

    This famous example illustrates the difficulties of MT. Use any standard translation system to translate the old saying "The spirit is willing but the flesh is weak" to French and then back to English and you will get something like "The alcohol is strong but the meat is weak.


    Where to Find

    
    Accent
    
    Jerusalem, Israel
    Phone:    +972 2 793 723 243
    Fax:      +972 2 793 731
    E-Mail:   
    normank@accent.co.il
    
    Internet: 
    http://www.accentsoft.com
    
    
    Globalink Europe
    
    Bracknell, Berkshire, U.K.
    Phone:    +44 1344 382111
    Fax:      +44 1344 382112
    Internet: 
    http://www.globalink.com
    
    
    IBM Europe
    
    Paris, France
    Phone:    +33 16 38 55 77 77
    E-Mail:   
    ibmrep@fr.ibm.com
    
    
    Logos
    
    Eschborn /Ts., Germany
    Phone:    + 49-61 96-59 03 0
    Fax:      + 49-61 96-59 03 15
    Internet: 
    http://www.logos-ca.com
    
    
    Rank Xerox Research Centre Grenoble
    
    Meylan, France
    Phone:    +33 76 61 50 76
    Fax:      +31 76 61 50 99
    
    v.Rheinbaben & Busch Electronic Publishing
    
    Munich, Germany
    Phone:    +49 89 723 77 77
    Fax:      +49 89 723 87 58
    
    HotBYTEs
     - information on products covered or advertised in BYTE
    
    

    A Subtle Differential

    screen_link (57 Kbytes)

    Detecting subtle contextual differences requires fine-tuning of dictionaries and syntactical analysis.


    Peter Haapaniemi and Peter Hofland are technology journalists at The Visual Consultancy Corp. in Amsterdam. You can contact t hem at 100544.307@compuserve.com .

  • Up to the International Features section contentsGo to next article: Direct Speech-to-Speech TranslationSearchSend a comment on this articleSubscribe to BYTE or BYTE on CD-ROM  
    Flexible C++
    Matthew Wilson
    My approach to software engineering is far more pragmatic than it is theoretical--and no language better exemplifies this than C++.

    more...

    BYTE Digest

    BYTE Digest editors every month analyze and evaluate the best articles from Information Week, EE Times, Dr. Dobb's Journal, Network Computing, Sys Admin, and dozens of other CMP publications—bringing you critical news and information about wireless communication, computer security, software development, embedded systems, and more!

    Find out more

    BYTE.com Store

    BYTE CD-ROM
    NOW, on one CD-ROM, you can instantly access more than 8 years of BYTE.
     
    The Best of BYTE Volume 1: Programming Languages
    The Best of BYTE
    Volume 1: Programming Languages
    In this issue of Best of BYTE, we bring together some of the leading programming language designers and implementors...

    Copyright © 2005 CMP Media LLC, Privacy Policy, Your California Privacy rights, Terms of Service
    Site comments: webmaster@byte.com
    SDMG Web Sites: BYTE.com, C/C++ Users Journal, Dr. Dobb's Journal, MSDN Magazine, New Architect, SD Expo, SD Magazine, Sys Admin, The Perl Journal, UnixReview.com, Windows Developer Network