Back to Contents of Issue: September 2001
|
|
by Alex Stewart |
|
Dr. Nick Campbell is a very British researcher in a very green corner of Japan. Research director of the Expressive Speech Processing Project at the ATR Information Sciences Division, he first came to Japan in 1975 as an English language instructor. Before joining the ATR labs in 1991, he was a research fellow at the IBM (UK) Scientific Center, and then a Senior Linguist at Edinburgh University's Speech Technology Research Center. He became the first foreign head of an ATR department in 1997.
His research field is one of those "over-the-horizon" areas that could shape the future of telecommunications. His goal, broadly speaking, is to get computers to talk like humans, and to understand how intonation and voice quality express subtleties of meaning. Alex Stewart interviewed Dr. Campbell at the ATR Labs. Can you start by giving some background on what kind of research you're doing at ATR? What is CHATR? How can a company use it? We now have good enough technology to capture the nuances of speech of a person in their native language and retain those same nuances translated into another language. What are some examples of applications for your technology? With CHATR, we used the voice of a famous dancer for our weather forecaster's personality voice on Hankyu Railway's Web page. We can synthesize a person's voice so that it will be instantly recognizable. We can't quite get the exact flow of natural speech yet, as the sounds get swallowed occasionally, but we are very close. I can see, at a popular level, how kids who like to surf the mobile Web and listen to music or audio messages could enjoy hearing synthesized voices. For example, you can use the voices of well-known TV personalities and have them talk in different contexts, such as the weather. If you can create conversations that are cute or familiar, it's likely to be a big success with young people. Another example is voice synthesis in car navigation systems, which is going to be really big. You are clearly very enthusiastic about your work: What drives you? What do you expect to be able to show in five years' time? I think that if we are going to live with Web-based information services, then some kind of voice access to that information will be essential -- and a friendly conversational style of synthesis will be needed if the technology isn't going to drive people crazy. My Expressive Speech Processing project will have come to an end in five years' time, so we'll have some interesting demos and prototypes to show you, but it will be a lot longer before the technology is able to perform as well as the people on the street expect. But if you look at the explosion in portable phone use, and at the rapid growth of the Internet as a source of information, and then think how many of those people would be happier using their phone than using a computer keyboard to access the information, you'll see how important it is that we crack the code of speech communication, and get some friendly and fun technology out there soon. There are a lot of people waiting for it!
|
Note: The function "email this page" is currently not supported for this page.