Xiaomi unveiled its latest application of advanced algorithms and self-developed speech technology to the accessibility field. The spontaneous style Text-To-Speech technology, which is developed by Xiaomi AI Lab, is used to generate a unique and customized voice for a user with speech disorders. As a part of the ‘Own My Voice’ pre-research project led by the Xiaomi Technical Committee, this successful attempt demonstrates Xiaomi’s commitment to ‘Tech for Good’ and to achieving its mission of “letting everyone in the world enjoy a better life through innovative technology”.
Xiaomi cares about people, and endeavours to fulfil their diverse needs through technological innovation. It discovered the desire of many users with speech disorders to own their unique voices for daily communication and established the ‘Own My Voice’ project team to invite a user with speech disorders as the voice recipient.
We are excited to explore multiple values that technology innovation brings to us, such as responding to users’ demands for self-identity and the construction of identity.Zhu Xi, Technology Committee topic convener on Tech for Good, Xiaomi Corporation
In order to generate the most suitable and personalized voice for the recipient, the project team recruited more than 200 volunteers within Xiaomi to donate their voices. They used the voiceprint matching algorithm to match the features of volunteers’ donated voices with those of the recipient’s voice. Through this approach, they found the most suitable voice as the basic sound of voice reference for the recipient. In consideration of personalization and privacy protection, the chosen real voice was manipulated with complex acoustic modification to form a new and original sound of the voice.
Next, they used spontaneous style Text-To-Speech technology to train AI models, making this new voice gradually gain a natural rhythm and intonation that can truthfully express the emotion and the tone of a human. The ‘Own My Voice’ project combines a variety of most advanced algorithms with Xiaomi’s self-developed speech technology to ensure the specificity, safety, and high genuineness of the synthesized voice, creating a new idea on customized speech synthesis for users with speech disorders.
The backbone of this project is a group of speech technology experts from Xiaomi AI Lab. Since 2017, they have published 37 papers on speech in the proceedings of top international conferences, such as the International Conference on Acoustics, Speech, and Signal Processing (ICASSP). The success of “Own My Voice” mainly depends on the spontaneous style Text-To-Speech technology developed by them.
The spontaneous style Text-To-Speech technology essentially makes the synthesized voice like a real human in its intonation, pause, speed, and other features. This replaces the monotonous and unnatural feeling of the electronic voice with a more natural one. Currently, this technology applies to many smart devices equipped with Xiaoai, the AI voice assistant of Xiaomi. The “Own My Voice” project showcases that spontaneous style Text-To-Speech technology can also be widely adopted in accessibility areas and improve user experience.
Moving forward, Xiaomi will continue receiving feedback from the voice recipient, and further study the feasibility of this project in a wider range. Xiaomi will keep empowering accessibility through cutting-edge technology, endeavouring to fulfil people’s diverse needs through technology innovation.