riosities. But Chinese handwriting recognition could get a major boost when Apple and Motorola enter the market later this year.
Apple, which last year demonstrated a Chinese voice-recognition product, is planning to move into the Chinese-handwriting segment. At last Comdex, Motorola's Lexicus division showed a pen-based digitizer tablet that attaches to a PC via RS-232 connection for Chinese-language input. This product, which is called WisdomPen, is now available (see the figure
"How WisdomPen Works"
).
Other Asian companies are shipping similar products. Taiwan-based Pen Power Technology has a new pen-based digitizer tablet for Windows 95 that offers faster recognition of Chinese, Japanese, and English than previous models. Taipei's GoTop Information has a small touchscreen digitizer pad for Chinese input. Han Wang Science and Technology in Beijing has a new digitizer tablet that supports Chinese, Japanese, a
nd Korean.
Meanwhile, Japan's Casio, Taiwan's Palmax, and others are selling Chinese handwriting-recognition PDAs. Palmax's new 68000-based PDA, the InfoRay PD-96, combines an electronic organizer, calculator, and English-to-Chinese and Chinese-to-English dictionaries. It can recognize more than 13,000 Chinese characters. Future PDAs from Palmax will include built-in pager capabilities and, down the road, a cellular phone based on Global System for Mobile Communications (GSM).
Different Character Sets
There are five different -- and complex -- methods for inputting Chinese into a PC. One popular method is called Cang Jie. It breaks down Chinese characters into 26 building blocks, or radicals, on a PC keyboard. A popular method in Taiwan, dubbed Zhuyin, uses a set of 37 phonetic symbols on the PC keyboard for input purposes. Another system used in China, Pinyin, uses standard romanized letters. In China, there is a five-stroke method called Wu Bi, while Hong Kong-based Ziran has
come up with a 10-stroke system.
Compounding the complex keyboard input problem are two basic but somewhat different Chinese character sets: traditional and simplified. Traditional characters are used in Hong Kong and Taiwan. There are 13,052 traditional characters. Hong Kong also has 4000 distinct characters.
Simplified characters (the GB character set) are used in China. This set has 6700 symbols. Japanese kanji characters overlap traditional and, to a lesser extent, simplified characters.
Recognition Engines
Despite the complexity of the Chinese language, many Chinese-based handwriting systems exceed the recognition rates of their English-based counterparts, says Derek Ling, Asia-Pacific business development manager for Lexicus. Lexicus develops and sells English and Chinese handwriting-recognition products. Lexicus's digitizer tablet for Chinese input has a recognition rate of 96 percent, the company claims. The company's English-language handwriting software has a r
ecognition rate in the mid-80 percentile, Ling says.
Palm Computing's Graffiti handwriting-recognition software, used in Hewlett-Packard's OmniGo 100 and other PDAs, offers a high degree of accuracy. But Graffiti's technology is based on a predefined unistroke alphabet system, which must be learned by end users.
"In many cases, it's actually harder to recognize English than Chinese," Ling says. "In English, the recognition engine is trying to decipher a different alphabet, because people have different handwriting characteristics. In Chinese, the characters are distinct. Each stroke in a Chinese character provides more information to the recognition engine."
Unlike English, which you can scribble in a willy-nilly fashion, you write Chinese characters with a series of distinct strokes -- one to 17. Chinese people learn a set stroke sequence for each character, but there are always variations.
The concept of cursive writing exists in Chinese characters, but it is much different and somet
imes easier to recognize in a PDA or system than English and other romanized languages, in which letters within a word run together. Consequently, English- and Chinese-based handwriting-recognition systems take different approaches to reach their goals. In Apple's Newton, for example, the system takes more of an intuitive approach.
The original (and much-maligned) handwriting-recognition engine in the Newton, which was written by ParaGraph International (Sunnyvale, CA), was supposed to match scrawls of ink against a 10,000-word dictionary. With Apple's most recent Newton OS 2.0 software for the PDA, which was introduced earlier this year, the PDA's handwriting engine is broken up into two engines.
An enhanced engine translates connected, cursive text using a 30,000-word dictionary and improved recognition algorithms. A new engine in Newton OS 2.0 converts unconnected, printed text -- based on an artificial neural-network technology. This technology uses stroke information to classify characters an
d can learn your handwriting over time.
In contrast, products with Chinese handwriting-recognition capabilities use different software-based algorithms that are less intuitive than the English-language systems. The three approaches to Chinese handwriting recognition are statistical, structural, and hybrid.
Statistical approaches use a set of measurements or select features taken from a Chinese character for identification purposes. In other words, the software in a system selects or extracts 2-D features in a Chinese symbol and tries to match them through a pixel-by-pixel comparison to character templates that the program holds in main memory. Templates match geometrical and topological features in a Chinese character. This approach offers superior recognition rates, but the number of features required in main memory or the database can become large, opening the door for noise and distortions.
The use of structural algorithms is a top-down approach that expresses characters in three categori
es: segments, strokes, and radicals. You write Chinese characters in terms of segments and strokes. In on-line character recognition, a pattern of these segments and strokes can be strung together and matched in a database to identify the Chinese symbol.
The number of features in memory can be reduced by breaking down characters in terms of radicals. About 250 radicals are required to make up all Chinese characters. Structural approaches are sometimes more accurate in terms of recognition rates than the statistical algorithms, but there are possible problems with this system in terms of variations of stroke order and stroke numbers.
Today's Chinese handwriting-recognition products use a combined statistical/structural approach, or hybrid approach. It combines the strengths of both the statistical and structural algorithms.
Next-Generation PDAs
Palmax's first PDA, introduced in 1994, used an Intel-compatible, 8-bit 8086 microprocessor to produce fair to decent recognition r
ates. Palmax's new
InfoRay PD-96
uses a Motorola 68000 chip, combined with a specialized ASIC, to produce Chinese handwriting-recognition rates of approximately 96 percent, according to Santus Lin, vice president.
There are other improvements. Stroke order in Chinese handwriting in the PD-96 can be flexible, as opposed to very strict, he says. The PD-96 also recognizes both printed and semicursive Chinese handwriting. This PDA, which measures 128 by 80 by 18 mm and weighs just 150 grams, comes with 8 MB of mask ROM, 512 KB of memory, and a 128- by 160-dot LCD. Running a proprietary OS, Palmax's product sells for less than $200.
However, the PD-96 does not have built-in fax/modem capabilities or PC Card slots -- yet. It does have an IrDA-compatible module capable of sending data at speeds of 9600 bps to 115 Kbps -- at a length of only 40 inches. It also supports Windows applications, including English and Chinese versions.
The PD-96 uses a recognition kernel licensed fro
m Pen Power. It is split into Chinese- and English-language kernels, both of which are written in C.
Japan's Casio takes a different approach to the same problem. Casio, which sells a PDA with Chinese handwriting-recognition capabilities in China, Hong Kong, and Taiwan, has a pair of products, the DV-5000 and DV-8000.
Based on Motorola's 68HC05 CPU, the more powerful DV-8000 combines an electronic organizer, calculator, and English-to-Chinese and Chinese-to-English dictionaries. It also has a simple speech-recognition chip, which can repeat a word in Chinese and English that is being looked up in the dictionary. The DV-8000 sells for a suggested retail price of $430.
Casio's Chinese recognition engine is licensed from GoTop. GoTop's system is written in assembly language code, which is more compact and faster than C code. GoTop's engine also takes up only 16 KB of memory, compared to 70 KB in most C-coded kernels.
Multilanguage Digitizer Tablets
Chinese PDAs are desi
gned for the growing mobile markets, while pen-based digitizer tablets are targeted for desktop applications. Chinese input tablets have been around since the early 1990s, but they have not moved into the mainstream nor dared to replace the keyboard.
This is a simple technology. By pressing a stylus or electronic pen on a tablet, you write a character. The tablet digitizes each point and assigns them x and y coordinates, and searches its database to match the characters. Finally, the character appears on the PC's screen.
First-generation tablets were limited. Writing was restricted to a small portion of the PC's screen. They were also template-oriented: Different areas of the board were assigned functions.
Today's tablets, including WisdomPen, are more general-purpose devices that run under Windows 95 and popular applications. WisdomPen recognizes traditional and simplified characters. The product is available for under $170.
Pen Power offers a similar solution, but it supports English
and Japanese character sets, including kanji, hiragana, katakana, romanji, and symbols. Pen Power's new Chinese input product, the Pen Power Pen-Based Environment 4.0, also offers new and improved capabilities. These include character segmentation and ink processing. Character segmentation lets you write on the full screen on a PC, while ink processing lets you write in a personalized fashion.
GoTop offers a similar product, but instead of using a pen, the GoGoPen Touch Pad lets you use your finger as an alternative pointing device. Future versions of this product will support Japanese and Korean.
Work in Progress
Motorola and the Chinese Academy of Sciences of Beijing have opened a research laboratory to develop computer and communications technologies. Both have invested $1 million in the Joint Development Laboratory for Advanced Computer and Communications Technologies (JDL), which will develop Chinese-based speech, handwriting, and OCR technologies, as well as MPEG-2-based
systems for video compression.
JDL scientists are working on improved cursive-based handwriting recognition. "We can get high accuracy rates in printed Chinese handwriting recognition, but cursive is another matter," says one Motorola official.
To get to this next level, Chinese recognition engines will likely employ neural-network technology that learns to recognize your handwriting. Lexicus is already moving down this path with its Chinese recognition systems.
Where to Find
Apple Computer South Asia Pte. Ltd.
Singapore
Phone: +65 486 6176
Fax: +65 489 1975
Dataquest Japan
Tokyo, Japan
Phone: +81 3 3481 3670
Fax: +81 3 3481 3645
Internet:
http://www.gartner.co.jp