Just Say What You Want

Back to Contents of Issue: March 2001


Millions of Japanese can tell their cellphones what to do, thanks to Digital Media's voice recognition software. Next step: voice portals.

by Michael Thuresson

Venture Spotlight
Company Digital Media Inc.
Location Tokyo
Phone +813-3476-4852
URL www.digitalmedia.co.jp
Ownership Private
Founded September 1994
Employees 11
Products Voice-recognition software; systems integration; content provider for EZ-Web and J-Sky wireless Net services
Partners Intervoice-Brite (Nasdaq: INTV), IT Networks
Competitors Converse, Technovox, Panache, others for SI business; numerous others for content business
Financing ¥10 million
Investors Founders
DIGITAL MEDIA INC. has the look of an ideal Japanese New Economy company, and not just because it's a savvy tech venture based in Shibuya. DMI maintains a dual focus by developing voice-recognition software and content for wireless network operators and providing systems integration for clients, paying obeisance to the theory that post-Net bubble success on Japan's Web requires both online and offline business savvy.

DMI's president and CEO, Kagoshima-born Hiroshi Sakurai, is a seasoned wireless industry professional credited, together with two American engineers, with a patent relating (somewhat cryptically) to "universal connection mobile server communications." In practice, the patent is the basis for a hands-free, in-vehicle cellular platform that was commercialized and marketed in the US by Cellport Systems, a Colorado-based company Sakurai co-founded in 1993. After returning to Japan in 1994, Sakurai founded DMI at its current location -- long before Shibuya became associated with Bit Valley -- and last April he became the majority shareholder of Cellport Systems Japan. The new company aims to introduce Cellport's hands-free product to Japanese car manufacturers.

"Friends tell me I'm a rare Japanese. Starting small businesses is my hobby," says Sakurai. His past experience with successful ventures has helped provide the funding for DMI; he owns 90 percent, while colleague and friend Pat Kennedy, CEO at Cellport USA, owns the remainder.

DMI originally started as a local distributor for Voice Control Systems, a speech recognition software house based in Dallas that is now part of Philips. DMI has followed other business interests while relying on the voice application development for its core revenue, but it still derives some 95 percent of its business from voice-related systems integration. Intervoice Brite, a world leader in interactive voice response systems, develops the technology platform that DMI now uses in its applications.

In 1999, the company deployed its Voice Square voice-activated dialing service for users of DDI Cellular Group's mobile phones (with well over 3.5 million PHS users nationwide). Voice Square boasts a 95 percent recognition rate for voice-prompted phone numbers, and it also provides a rather nifty keyword-based, voice-activated directory service (which the firm claims is 85 percent accurate). "If you're hungry for Chinese food after, say, an interview in Shibuya, simply say, 'Chinese food' and 'Shibuya,' or 'two-thousand yen,' and restaurants that match those keywords will be retrieved from our servers via the DDI network," says Sakurai. The appropriate listings are served up onscreen and can then be voice dialed.

Voice Square can serve as a voice-enabled, always-accessible yellow pages. DMI hopes merchants will see this as a valuable way to directly access mobile phone users. Also, the fact that the mobile user is on the go means time and location are of the essence. If voice apps make for quicker, easier data retrieval, they may just become the killer app in this environment. "The fact that the user interface is so limited -- no keyboard and a very small screen -- combined with situations where we use mobile devices, like while driving, make voice-based technologies very critical," says Punnamas Vichitkulwongsa, founder of Tokyo wireless solutions provider Arriya Solutions. "Mobile users want to be able to interact with data services in many different ways."

To date, DMI's defining accomplishment has been customizing and fine-tuning Voice Square to be highly accurate, dependable, and useful for DDI Cellular customers. That the software can handle a large number of keywords, vocabulary, names, pronunciations, and dialects, while filtering out inherent background noise, is attributable to DMI's months of exhaustive data collection, testing, and tweaking. This fieldwork began in 1997 when the company won the contract from DDI Cellular. Data collection extended throughout 1997, followed by beta testing in 1998 and commercial deployment in 1999.

CEO Hiroshi Sakurai
CEO Hiroshi Sakurai
This year, DMI is again collecting data and refining the software's voice and keyword dictionary with the aim of convincing other mobile carriers that Voice Square is a sound product (the pun was irresistible). "We have to show the Japanese voice-recognition capabilities through demos and convince carriers this is solid wireless technology," says Sakurai.

Voice portals are a big part of DMI's future, he adds, and he expects voice activation to be a hot technology for mobile carriers and content providers alike: "What we're providing is a voice-recognition platform they can implement into their content portals."

He won't disclose which carriers DMI is targeting, but says, "We are in negotiations. A ready-to-go voice content--enabling platform will be highly valuable to them."

Others agree. Rob Hellstrom, principal at the wireless investment practice of VC Whitney & Company's Tokyo office, says, "Network-based voice-recognition services provide scope for keeping the value of the network, either the carrier's or service provider's, and also offer new revenue streams from making services easier to access and deliver."

In addition to new functions like voice-activated email (in a recent survey, the Mobile Contents Forum found that email voice read-back was desired by almost half of keitai-using respondents -- see Statistics, February 2001) and text-to-speech, Sakurai is also setting his sights on VoiceXML, a new technology that enables HTML Web sites to be voice-enabled -- and it requires no modification to the source content. (It also works with XML variants, such as HDML, cHTML, and MML, which are all presently used to develop wireless Web sites in Japan.) But this plethora of acronyms serves to emphasize the field's lack of agreed-upon standards and still-developing platforms. "The challenge for DMI is not to get stuck with any single technology," warns Vichitkulwongsa, adding, "It must offer an integrated voice-based solution that encompasses [all] the key voice technologies."

While only 5 percent of DMI's current business is generated from providing content to Japanese carriers, he expects that to grow to between 20 and 30 percent in 2001. Last year, through its news content provider, IT Networks (US), DMI bid to become one of DoCoMo's international news providers for i-mode, but lost the bid due to CNN already occupying the slot.

The company has won two content contracts since. Last August, again in partnership with IT Networks, DMI started providing English and Japanese news content for J-Phone's J-Sky service under the Jap@n Connection brand, and in December, the firm started providing IT Network's sports content for (KDDI mobile operator) Au's EZ-Web service, a product called Live.sports@USA. The news covers American sports in both Japanese and English, a content niche, according to Sakurai, in which many Japanese are interested for both news and English education purposes.

And DMI has its eyes on the broadband wireless Web as well. "I just went to our US content partner to talk about how we can serve Japan's 3G network," he says. "Obviously, streaming audio and video will become a part of our new content business."

Note: The function "email this page" is currently not supported for this page.