-Karthik Gurumurthy
So speech synthesizers have this surprisingly long history – way longer than computers! People have been trying to create talking machines for centuries, stemming from this deep human desire to build machines that can mimic us.
The earliest attempts at “talking machines” actually got their inventors in hot water with the Church! There’s this account about a talking head created by philosopher Albertus Magnus in the 13th century that supposedly got destroyed by Thomas Aquinas because it promoted ideas that clashed with Church teachings.
During the 1500s and 1600s, Europe saw a bunch of fake talking heads popping up. Most were just clever hoaxes with hidden tubes where the “inventor” was actually speaking from some underground chamber!
By the mid-1700s, inventors started getting more scientific about it. In 1779, a mysterious guy named Kratzenstein created what’s considered the first genuine talking machine after being inspired by a contest held by the Imperial Academy of Science in St. Petersburg. His contraption had five differently shaped tubes that, when air was blown through them, could produce sounds similar to the five vowels.
Around the same time, Wolfgang Ritter von Kempelen built a pneumatic device with a bell-shaped cavity divided into two chambers. By adjusting the opening between chambers, it could produce vowels, consonants, and even some simple words.
The real breakthrough came in 1835 when Professor Joseph Faber unveiled his “Euphonia” after years of perfecting it. This impressive machine had a rubber chamber mimicking human lungs, larynx, and mouth, with a bellows pushing air through differently shaped tubes and metal plates. Using a keyboard and foot pedals, Faber could make his mannequin laugh, whisper, and even recite entire sentences in a creepy monotone voice. Euphonia toured Europe and America, delighting audiences everywhere.
The first electronic speech synthesizers emerged in the 1930s. The Voice Operation Demonstrator (Voder) used vibrating electrical circuits to mimic human vocal cords. Then in the 1950s, research into computerized speech synthesizers began in earnest.
MIT developed the MITalk System in the late 1960s as a reading machine for blind people. Then in 1976, Raymond Kurzweil invented his famous Kurzweil reading machine that could read anything from menus to novels.
By 1978, Texas Instruments had created the Speak & Spell, that calculator-sized learning toy for kids that could recognize and pronounce 165 words. Two years later, IBM produced a computerized speech synthesizer with a 1,000-word vocabulary that displayed words on screen while reading them aloud.
Modern speech synthesizers work by analyzing text either typed or scanned, consulting a pronunciation dictionary, converting the text into sounds, and adding symbols for appropriate length, pitch, and volume. These patterns get translated into mathematical equations that approximate human speech and drive loudspeakers to generate sound waves resembling a human voice.
More recently, Janet Cahn at MIT created software called AffectEditor that puts emotion into synthetic speech. Her program can make computer-generated speech sound annoyed, friendly, angry, upset, impatient, or sad by altering volume, speed, rhythm, and pitch. For instance, angry speech is loud, high-pitched, and has irregular rhythm, while sad speech is soft, low-pitched, slow, and hesitating.
Cahn designed this to help people with speech impairments communicate better. In a 1997 Technology Review interview, she explained that “most nonspeaking people are frustrated by the technical options currently available to them. They don’t want to sound like a machine.” She also believes her software can improve reading machines for blind people.
Today, speech synthesizers are everywhere – they help blind people read, assist those with speech impairments, give instructions on educational computers, warn pilots of dangerous conditions in aircraft, and are built into everyday items like cars (reminding you to fasten your seatbelt), elevators announcing floors, talking scales, and even washing machines telling you when your laundry’s done!
Pretty amazing how we went from fake talking heads to the voice assistants we have today, right?
Leave a comment