I am non-verbal, and one of the reasons that I haven’t been fond of voice synthesizers is that the synthesized voice never sounds like mine. Yes, although I cannot verbalize, I still can vocalize. Just ask my IA colleagues when I grunt to signal affirmative responses during our Skype conferences. My words spoken by a computerized voice, no matter how feminine it is, sounds disembodied. Moreover, the same synthesized voice may be shared with many of the 2.5 million people in America who cannot talk. Imagine, a 47-year-old petite woman using the same voice as that of an 85-year-old tall, robust woman. The voice just wouldn’t be me.
VocalID, a project headed by Professor Rupal Patel and Dr. Tim Bunnell, has recognized the desire for non-verbal individuals to use their own voices to communicate. The VocalID team has developed a system where the voice of a person who can talk is matched with that of someone who cannot. The idea came to Professor Patel as she was attending an assistive technology conference several years ago. She heard hundreds of persons using various voice synthesizers with the same voices emitting from them. She could not differentiate who was saying what. So Dr. Patel wondered why they couldn’t have individual voices.
To get a customized voice, the process starts with the VocalID team recording whatever sounds the non-verbal person can utter. These sounds are the vocal source. Sound characteristics can include pitch, tone, and timbre. The age, sex, size, and ethnicity of the non-verbal communicator also influence the vocal source. Although I cannot say “Please, can you come here” to my mom who is in the next room, I can use one aspect of my vocal source—loud timbre, otherwise known as yelling—to let her know I need her.
Next, a donor voice must be found as a filter for the vocal source; In other words, someone to add the consonants and vowels. The person who donates his or her voice must be a similar age, size, gender, and ethnicity as the individual seeking a customized voice. The donor spends about three or four hours recording thousands or sentences like “I want to go home” or “The dog is running into the street.” The idea is to capture every vowel-consonant combination. Then these recordings are spliced to obtain the specific combination. For instance, from “The dog is running into the street”, the possible combinations are “un,” “ing” and “str”.
Finally, the customized sounds are programmed into a speech synthesizer. Currently, VocalID personalized voices are compatible with Windows-based assistive technology. However, versions for Android and iOS will be available in the future.
As Professor Patel says in her TED (Technology Education Development) presentation, “When you give blood, you save a life. When you give a voice, you change a life. Find out more about the VocalID project and how to donate your own voice.