Japanese Voice Recognition Software from IBM

If you're in the habit of talking to (or cursing at) your computer, take care: it soon may be able to understand you.

by Noriko Takezaki

There's good news for those who can speak Japanese, but are not confident about writing it. Now you can dictate to your computer, which will transcribe your spoken words into a kanji document.

IBM Japan has announced a new voice recognition software package that runs on Windows 95. VoiceType Dictation 3.0 (Japanese version) will be released around April as a stand-alone product for less than JPY20,000. It will also be pre-installed on the T models of IBM's Aptiva series desktop computers.

This software is the first Asian- language version of VoiceType Dictation, following the debut of localized versions in such countries as France, Germany, Italy, Spain, the US, and the UK. The initial Japanese-language version of VoiceType Dictation 3.0 runs only on the Windows 95 platform. (The English-language version, in contrast, is incorporated into English OS/2, version 4.0.)

Japanese VoiceType Dictation 3.0 has been developed jointly by IBM's TJ Watson Research Center (in New York) and Tokyo Research Laboratory. The Watson Research Center launched R&D work on acoustic technologies and English-language voice recognition in the early 1970s. Tokyo Research Laboratory started working on Japanese-language voice recognition technology about a decade ago. With VoiceType Dictation 3.0, the experience and achievements of these two labs have been successfully combined to introduce affordable voice recognition software to the Japanese market.

Speak freely
No special hardware is required to use IBM's new Japanese voice recognition software. What you do need is a relatively powerful system: a 133-MHz or faster Pentium-based PC running Japanese Windows 95, 32MB of RAM, and a SoundBlaster 16 (or equivalent) sound card. You also need a microphone, but IBM Japan plans to supply that as a standard accessory.

The software incorporates a 40,000-word Japanese dictionary, which can be increased to a maximum of 65,000 words. According to Hideki Gohhara, advisory systems engineer in charge of desktop software marketing at IBM Japan, "it can recognize 90% to 95% of natural Japanese voice input." Gohhara cautions, however, that the input accuracy for non-native speakers may be slightly lower, since the software's voice recognition technology was developed based on the vocalization patterns of native Japanese. Whether native or non-native speaker, though, the conversion accuracy of VoiceType Dictation 3.0 is said to be higher than that of any other existing voice recognition software.

The program's automatic kanji conversion is made through sentence-based analogical inference. If the initial conversion is not appropriate, you can make changes by choosing the right kanji from the displayed vocabulary group via key input.

Navigate or dictate
VoiceType Dictation has two basic capabilities: navigation and dictation. Navigation mode is used to access PC functions, such as telling the computer to launch an application program and open a file. Continuous, natural-speed voice input, such as "sutato yudora" ("Start Eudora") and "fairu hiraku" ("File open"), is accepted in navigation mode - as long as you articulate clearly.

When you have opened a target file for input, you can activate dictation mode by telling the computer to "onsei nyuryoku kaishi" ("Start voice input"). In dictation mode, your voice input is transformed into text data with automatic kanji conversion. During dictation, you need to pause slightly between words, and you must explicitly specify any desired punctuation, such as "ten" ("comma") or "maru" ("period").

Dictation mode works best with application software that was designed with voice recognition capabilities in mind. For use with software not so designed (and this includes most existing software), IBM provides a mediation tool called VoicePad to properly transmit your voice input data into the target application.

A wrench, not a hammer
As part of its pre-market research for the application, IBM Japan has been shipping beta versions of VoiceType Dictation pre-installed on Aptiva's H5F, H6F, and H7G models since December 1996. IBM Japan acknowledges that a few early users have experienced difficulties, mainly because of inappropriate expectations.

"Some users expected voice recognition technology to eliminate the need for conventional input tools, such as a keyboard and mouse. But this is not our intent," says Gohhara. "We are positioning this voice recognition software as an input-assistance tool, one to be used appropriately in conjunction with conventional input methods. Although users may find the software's voice recognition function convenient, it is important they understand its limitations." To help promote the software, IBM Japan intends to provide some form of tutorial, such as offering a visual operation guide on CD-ROM or through demonstration campaigns.

The Japanese application was developed for general use, targeting the individual market, with its vocabulary database (particularly its kanji conversion list) drawn from daily newspapers. Accordingly, it is not suited to specialized or technical usage.

The English version of the software targets business users and offers specialized optional dictionaries for such technical fields as radiology, medicine, law, and journalism. "Like in the US, we eventually plan to target business users in Japan. But not at this moment," says Gohhara. "Preparation of purpose-specific dictionaries will take time. Rather than delay the product release, we thought it would be better to introduce this epoch-making software as early as possible to a broader market in Japan."

In the US, meanwhile, IBM has been working with Netscape to develop a voice-driven version of Netscape Navigator. If that effort is successful, it's likely that a Japanese version won't be far behind.

In addition to packaged sales of VoiceType Dictation to end users, IBM Japan intends to license the software technology to other computer hardware and software firms, and electronic appliance companies. Several Japanese electronics manufacturers have already shown a keen interest in incorporating the technology into their product, according to Gohhara.

For IBM Japan, one hurdle to promoting the product's OEM business here is management of the product's intellectual property rights. Since the software technology was developed jointly by IBM's US and Japan operations, currently licensing agreements must be made in English among IBM US, IBM Japan, and the licensing party. This may hinder the swift licensing of the technology to Japanese companies.

Another brick in the wall
In 1996, IBM Japan increased its PC market share in Japan to 11.9%, according to a survey by Dataquest. This was 1.2 times its share in 1995, and it enabled IBM Japan to wrest 3rd place from Apple Japan.

IBM still has a long way to go, though, if it hopes to close the long lead enjoyed by top runners NEC and Fujitsu, who held a 32.0% and 21.9% market share, respectively. The new release of VoiceType Dictation is seen as playing an important role in closing that distance.

Voice input could well become the biggest advance in input technology since the mouse. The immediate task for IBM Japan in promoting this innovative voice recognition product will be to develop a clear marketing strategy. A second, but no less vital task, will be to continue refining the product's capabilities and increasing the available dictionaries for VoiceType Dictation tailored for use in selected fields.

For information about the Japanese version of VoiceType Dictation, contact IBM Japan at 0120-04-1992.

For more information in English about IBM's VoiceType speech recognition products, see http://www.software.ibm.com/is/voicetype/.

If you want to get the English version VoiceType in Japan, IBM Japan recommends that you contact: Tokyo System Trade Co., Ltd. 7-7-8 Roppongi, Minato-ku, Tokyo Phone: 03-5411-1805 Fax: 03-5411-2358

Back to table of contents