Archives
 
 
 
  Special
 
 
 
  About Us
 
 
 

Newsletter
Free E-mail Newsletter from BYTE.com

 
    
           
Visit the home page Browse the four-year online archive Download platform-neutral CPU/FPU benchmarks Find information for advertisers, authors, vendors, subscribers Request free information on products written about or advertised in BYTE Submit a press release, or scan recent announcements Talk with BYTE's staff and readers about products and technologies

ArticlesCharacter Recognition


September 1996 / International Features / Character Recognition

Apple and Motorola will launch new products with Chinese handwriting recognition.

Mark LaPedus

English handwriting recognition is one of the key technologies in today's PDAs. However, handwriting-recognition engines for Chinese and other Oriental languages may prove to be even more widely used. There are more than 1 billion people in China and Chinese-speaking regions in Asia. Aside from the sheer cost of buying a PC, the major barrier in selling to these people is adapting the PC keyboard to support more than 13,000 Chinese characters.

Companies in China, Hong Kong, and Taiwan are developing alternative Chinese input devices and technologies, including PC-based keyboard methodologies and handwriting- and voice-recognition products. So far, the newer techn ologies are only niche-market cu riosities. But Chinese handwriting recognition could get a major boost when Apple and Motorola enter the market later this year.

Apple, which last year demonstrated a Chinese voice-recognition product, is planning to move into the Chinese-handwriting segment. At last Comdex, Motorola's Lexicus division showed a pen-based digitizer tablet that attaches to a PC via RS-232 connection for Chinese-language input. This product, which is called WisdomPen, is now available (see the figure "How WisdomPen Works" ).

Other Asian companies are shipping similar products. Taiwan-based Pen Power Technology has a new pen-based digitizer tablet for Windows 95 that offers faster recognition of Chinese, Japanese, and English than previous models. Taipei's GoTop Information has a small touchscreen digitizer pad for Chinese input. Han Wang Science and Technology in Beijing has a new digitizer tablet that supports Chinese, Japanese, a nd Korean.

Meanwhile, Japan's Casio, Taiwan's Palmax, and others are selling Chinese handwriting-recognition PDAs. Palmax's new 68000-based PDA, the InfoRay PD-96, combines an electronic organizer, calculator, and English-to-Chinese and Chinese-to-English dictionaries. It can recognize more than 13,000 Chinese characters. Future PDAs from Palmax will include built-in pager capabilities and, down the road, a cellular phone based on Global System for Mobile Communications (GSM).

Different Character Sets

There are five different -- and complex -- methods for inputting Chinese into a PC. One popular method is called Cang Jie. It breaks down Chinese characters into 26 building blocks, or radicals, on a PC keyboard. A popular method in Taiwan, dubbed Zhuyin, uses a set of 37 phonetic symbols on the PC keyboard for input purposes. Another system used in China, Pinyin, uses standard romanized letters. In China, there is a five-stroke method called Wu Bi, while Hong Kong-based Ziran has come up with a 10-stroke system.

Compounding the complex keyboard input problem are two basic but somewhat different Chinese character sets: traditional and simplified. Traditional characters are used in Hong Kong and Taiwan. There are 13,052 traditional characters. Hong Kong also has 4000 distinct characters.

Simplified characters (the GB character set) are used in China. This set has 6700 symbols. Japanese kanji characters overlap traditional and, to a lesser extent, simplified characters.

Recognition Engines

Despite the complexity of the Chinese language, many Chinese-based handwriting systems exceed the recognition rates of their English-based counterparts, says Derek Ling, Asia-Pacific business development manager for Lexicus. Lexicus develops and sells English and Chinese handwriting-recognition products. Lexicus's digitizer tablet for Chinese input has a recognition rate of 96 percent, the company claims. The company's English-language handwriting software has a r ecognition rate in the mid-80 percentile, Ling says.

Palm Computing's Graffiti handwriting-recognition software, used in Hewlett-Packard's OmniGo 100 and other PDAs, offers a high degree of accuracy. But Graffiti's technology is based on a predefined unistroke alphabet system, which must be learned by end users.

"In many cases, it's actually harder to recognize English than Chinese," Ling says. "In English, the recognition engine is trying to decipher a different alphabet, because people have different handwriting characteristics. In Chinese, the characters are distinct. Each stroke in a Chinese character provides more information to the recognition engine."

Unlike English, which you can scribble in a willy-nilly fashion, you write Chinese characters with a series of distinct strokes -- one to 17. Chinese people learn a set stroke sequence for each character, but there are always variations.

The concept of cursive writing exists in Chinese characters, but it is much different and somet imes easier to recognize in a PDA or system than English and other romanized languages, in which letters within a word run together. Consequently, English- and Chinese-based handwriting-recognition systems take different approaches to reach their goals. In Apple's Newton, for example, the system takes more of an intuitive approach.

The original (and much-maligned) handwriting-recognition engine in the Newton, which was written by ParaGraph International (Sunnyvale, CA), was supposed to match scrawls of ink against a 10,000-word dictionary. With Apple's most recent Newton OS 2.0 software for the PDA, which was introduced earlier this year, the PDA's handwriting engine is broken up into two engines.

An enhanced engine translates connected, cursive text using a 30,000-word dictionary and improved recognition algorithms. A new engine in Newton OS 2.0 converts unconnected, printed text -- based on an artificial neural-network technology. This technology uses stroke information to classify characters an d can learn your handwriting over time.

In contrast, products with Chinese handwriting-recognition capabilities use different software-based algorithms that are less intuitive than the English-language systems. The three approaches to Chinese handwriting recognition are statistical, structural, and hybrid.

Statistical approaches use a set of measurements or select features taken from a Chinese character for identification purposes. In other words, the software in a system selects or extracts 2-D features in a Chinese symbol and tries to match them through a pixel-by-pixel comparison to character templates that the program holds in main memory. Templates match geometrical and topological features in a Chinese character. This approach offers superior recognition rates, but the number of features required in main memory or the database can become large, opening the door for noise and distortions.

The use of structural algorithms is a top-down approach that expresses characters in three categori es: segments, strokes, and radicals. You write Chinese characters in terms of segments and strokes. In on-line character recognition, a pattern of these segments and strokes can be strung together and matched in a database to identify the Chinese symbol.

The number of features in memory can be reduced by breaking down characters in terms of radicals. About 250 radicals are required to make up all Chinese characters. Structural approaches are sometimes more accurate in terms of recognition rates than the statistical algorithms, but there are possible problems with this system in terms of variations of stroke order and stroke numbers.

Today's Chinese handwriting-recognition products use a combined statistical/structural approach, or hybrid approach. It combines the strengths of both the statistical and structural algorithms.

Next-Generation PDAs

Palmax's first PDA, introduced in 1994, used an Intel-compatible, 8-bit 8086 microprocessor to produce fair to decent recognition r ates. Palmax's new InfoRay PD-96 uses a Motorola 68000 chip, combined with a specialized ASIC, to produce Chinese handwriting-recognition rates of approximately 96 percent, according to Santus Lin, vice president.

There are other improvements. Stroke order in Chinese handwriting in the PD-96 can be flexible, as opposed to very strict, he says. The PD-96 also recognizes both printed and semicursive Chinese handwriting. This PDA, which measures 128 by 80 by 18 mm and weighs just 150 grams, comes with 8 MB of mask ROM, 512 KB of memory, and a 128- by 160-dot LCD. Running a proprietary OS, Palmax's product sells for less than $200.

However, the PD-96 does not have built-in fax/modem capabilities or PC Card slots -- yet. It does have an IrDA-compatible module capable of sending data at speeds of 9600 bps to 115 Kbps -- at a length of only 40 inches. It also supports Windows applications, including English and Chinese versions.

The PD-96 uses a recognition kernel licensed fro m Pen Power. It is split into Chinese- and English-language kernels, both of which are written in C.

Japan's Casio takes a different approach to the same problem. Casio, which sells a PDA with Chinese handwriting-recognition capabilities in China, Hong Kong, and Taiwan, has a pair of products, the DV-5000 and DV-8000.

Based on Motorola's 68HC05 CPU, the more powerful DV-8000 combines an electronic organizer, calculator, and English-to-Chinese and Chinese-to-English dictionaries. It also has a simple speech-recognition chip, which can repeat a word in Chinese and English that is being looked up in the dictionary. The DV-8000 sells for a suggested retail price of $430.

Casio's Chinese recognition engine is licensed from GoTop. GoTop's system is written in assembly language code, which is more compact and faster than C code. GoTop's engine also takes up only 16 KB of memory, compared to 70 KB in most C-coded kernels.

Multilanguage Digitizer Tablets

Chinese PDAs are desi gned for the growing mobile markets, while pen-based digitizer tablets are targeted for desktop applications. Chinese input tablets have been around since the early 1990s, but they have not moved into the mainstream nor dared to replace the keyboard.

This is a simple technology. By pressing a stylus or electronic pen on a tablet, you write a character. The tablet digitizes each point and assigns them x and y coordinates, and searches its database to match the characters. Finally, the character appears on the PC's screen.

First-generation tablets were limited. Writing was restricted to a small portion of the PC's screen. They were also template-oriented: Different areas of the board were assigned functions.

Today's tablets, including WisdomPen, are more general-purpose devices that run under Windows 95 and popular applications. WisdomPen recognizes traditional and simplified characters. The product is available for under $170.

Pen Power offers a similar solution, but it supports English and Japanese character sets, including kanji, hiragana, katakana, romanji, and symbols. Pen Power's new Chinese input product, the Pen Power Pen-Based Environment 4.0, also offers new and improved capabilities. These include character segmentation and ink processing. Character segmentation lets you write on the full screen on a PC, while ink processing lets you write in a personalized fashion.

GoTop offers a similar product, but instead of using a pen, the GoGoPen Touch Pad lets you use your finger as an alternative pointing device. Future versions of this product will support Japanese and Korean.

Work in Progress

Motorola and the Chinese Academy of Sciences of Beijing have opened a research laboratory to develop computer and communications technologies. Both have invested $1 million in the Joint Development Laboratory for Advanced Computer and Communications Technologies (JDL), which will develop Chinese-based speech, handwriting, and OCR technologies, as well as MPEG-2-based systems for video compression.

JDL scientists are working on improved cursive-based handwriting recognition. "We can get high accuracy rates in printed Chinese handwriting recognition, but cursive is another matter," says one Motorola official.

To get to this next level, Chinese recognition engines will likely employ neural-network technology that learns to recognize your handwriting. Lexicus is already moving down this path with its Chinese recognition systems.


Where to Find


Apple Computer South Asia Pte. Ltd.

Singapore
Phone:    +65 486 6176
Fax:      +65 489 1975

Dataquest Japan

Tokyo, Japan
Phone:    +81 3 3481 3670
Fax:      +81 3 3481 3645
Internet: 
http://www.gartner.co.jp


Dataquest Taiwan

Taipei, Taiwan R.O.C.
Phone:    +886 2 756 0389
Fax:      +886 2 756 2663
E-Mail:   
blee@dataquest.com


GoTop Information, Inc.

Taipei, Taiwan R.O.C.
Phone:    +886 2 788 2408
Fax:      +886 2 788 1031
Internet: 
http://www.gotop.com/


Han Wang Science and Technology Corp.

Beijing, China
Phone:    +86 10 261 1264
Fax:      +86 10 253 6822

Industrial Technology Research Institute

Computer and Communication Research Laboratories
Hsinchu, Taiwan R.O.C.
Phone:    +886 035 917 743
Fax:      +886 035 820 044
E-Mail:   
a000@ccloal.ccl.itri.org.tw


Motorola -- Lexicus Division

Palo Alto, CA
Phone:    (415) 462-6800
Fax:
      (415) 323-0482
E-Mail:   
derekl@lexicus.mot.com

Internet: 
http://www.mot.com/lexicus/


Palmax Technology Co. Ltd.

Taipei, Taiwan R.O.C.
Phone:    +886 2 226 6007
Fax:      +886 2 226 1215

Pen Power Technology Ltd.

Hsinchu, Taiwan R.O.C.
Phone:    +886 035 722 691
Fax:      +886 035 716 243
E-Mail:   
penpower@ms1.hinet.net


Taiwan Casio Ltd.

Taipei, Taiwan R.O.C.
Phone:    +886 2 393 2511
Fax:      +886 2 395 2518

HotBYTEs
 - information on products covered or advertised in BYTE


How WisdomPen Works

illustration_link (27 Kbytes)


InfoRay Interface

screen_link (57 Kbytes)

The interface of the InfoRay PD-96 from Palmax Technology in Taiwan.


Mark LaPedus is a BYTE contributing editor based in Taipei. You can reach him on MCI mail at 591-6955.

Up to the International Features section contentsGo to previous article: SearchSend a comment on this articleSubscribe to BYTE or BYTE on CD-ROM  
Flexible C++
Matthew Wilson
My approach to software engineering is far more pragmatic than it is theoretical--and no language better exemplifies this than C++.

more...

BYTE Digest

BYTE Digest editors every month analyze and evaluate the best articles from Information Week, EE Times, Dr. Dobb's Journal, Network Computing, Sys Admin, and dozens of other CMP publications—bringing you critical news and information about wireless communication, computer security, software development, embedded systems, and more!

Find out more

BYTE.com Store

BYTE CD-ROM
NOW, on one CD-ROM, you can instantly access more than 8 years of BYTE.
 
The Best of BYTE Volume 1: Programming Languages
The Best of BYTE
Volume 1: Programming Languages
In this issue of Best of BYTE, we bring together some of the leading programming language designers and implementors...

Copyright © 2005 CMP Media LLC, Privacy Policy, Your California Privacy rights, Terms of Service
Site comments: webmaster@byte.com
SDMG Web Sites: BYTE.com, C/C++ Users Journal, Dr. Dobb's Journal, MSDN Magazine, New Architect, SD Expo, SD Magazine, Sys Admin, The Perl Journal, UnixReview.com, Windows Developer Network