New Value Created by an Advanced Natural and Human-like Speech Synthesis TechnologyAI, Inc.,


High-Quality Speech Synthesis Engine “AITalk”

  • Artificial speech that is so natural that it cannot be differentiated from human voice has begun to penetrate various corners of society. This article covers a company challenging the potential of a speech synthesis technology.

Deciding to start a business to develop a revolutionary speech synthesis technology

AI, Inc., established in 2003, develops software systems specializing in speech synthesis technology. Daisuke Yoshida, Representative Director of AI, first encountered the corpus-based speech synthesis technology while working at Advanced Telecommunications Research Institute International (ATR). This technology uses a database based on human voice (speech dictionary) and converts text to a natural speech. Yoshida felt great potential in this technology that clearly departed from conventional technologies that produced unnatural machine sounds. Yoshida visited many companies to sell licenses for this technology, and participated in its commercialization at ATR, but the business didn't take off partly due to the low level of awareness of speech synthesis by the general public at that time.

Given this environment, Yoshida decided on starting his own business rather than calling it an end. With the encouragement from companies wanting to invest in his startup, he was able to independently establish AI, Inc. and started with the sales of AIVoice®, which he had commercialized while working at ATR. Once the number of companies implementing the technology increased little by little, he received various customization requests from users, leading to the company's establishment of its own development division to answer such requests.

Corpus-based speech synthesis technology

Supports 15 voices and 36 languages with a newly developed speech synthesis engine

In 2007, AI launched sales of AITalk®, a new developed speech synthesis engine, which eliminates the differences in intonation that used to pose a problem in past products and synthesizes a more natural human-like speech. It also creates a speech dictionary in a short amount of time, resulting in an increased variation of speech types. Currently, 15 types of voices are available (7 female, 4 male, 2 young female, 2 young male), and the voice can be selected according to the scene. Furthermore, the intonation and speed of speech are adjustable, and even emotions such as joy, anger, and sorrow can be expressed in some of the voices. AITalk supports foreign languages also synthesizing speech in 36 languages, and it is being used for tourism information targeting foreign travelers. The company also has an expanded its product line in addition to the product package, including a server installation type, equipment integration type, cloud type, and web service type, spreading use of synthesized speech by AITalk across various social scenes. “Almost everyone most likely hears the speech of AITalk somewhere in their daily lives without noticing that it is synthesized,” says Yoshida. AITalk, implemented by over 500 companies, received the Grand Prize at the Tokyo Venture Technology Award 2014.

AITalk has wide-ranging record of applications such as wireless equipment for disaster prevention administration, road traffic information, tourist information, facility information, train announcement, automated phone response, games, e-learning, and spoken dialogue. The voices of NTT Docomo's Shabette characters, Softbank Robotics' Pepper, and Matsuko-Roid are not voice recordings; they were all created with AITalk.

AITalk supports 15 voice types

The spread of speech utilization has led to the creation of new services that contribute to society

One unique example user of AITalk is by Rie Saito, a deaf member of the Tokyo Kita-ward Assembly, who speaks out with AITalk in the assembly meetings. Other assembly members have praised AITalk, saying that it is easy to understand. Oita Godo Shimbun Inc. uses AITalk for their audio newspaper service for the elderly and people with poor eyesight. By combining the audio function with e-books, which have garnered wide popularity in recent years, the joy of reading will most likely spread even to the blind.

Creating voice services required large costs in the past, because professional voice actors and announcers needed to record their voice in a studio in advance. But, by using speech synthesis technology, text can be read by a PC and the content can be revised whenever there is a change. When speech technology becomes more familiar to the general populace and penetrates all corners of society, there will certainly be many new applications unseen before.

Audio sample created by AITalk (click on the play button to listen to the sample)

AITalk® is a high-quality speech synthesis solution , capable of flexible and natural human-like speech synthesis adopting human voices using corpus base speech synthesis.

