Being eloquent about Eloquence: a formant toast
The eloquence software speech synthesizer has, for many years, been a staple of not only my life but that of innumerable other screen reader users as well.
I wanted to write something of a tribute to it here, as a sort of opening for the more personal entries on this site.
I can’t truly remember the first time I started using the JAWS Screen Reader - not specifically. I was too young, I suppose - and even had I taken the
time to mentally fix the date and time in my mind, I hadn’t the technical expertise nor the years of experience to appreciate the differences between a
screen reader and a synthesizer. I grew up running Hal (from Dolphin Systems) on a dos machine, and when windows came along and I found JAWS, simply accepted
that they had different voices.
It occurred to me yesterday whilst playing jumble on my Franklin Language Master that I was truly getting old - the speech was too fast for me to distinguish
the c and the z and caused me to make a mistake. So when I came back to my laptop and started reading some e-mail, I took a moment to appreciate the clarity
that Eloquence offers. Of course you can argue that I hadn’t used the Franklin device for several years, the volume was low, I wasn’t paying a great deal
of attention… yet all of those things, whilst true, still make me think that Eloquence means much to me.
The text-to-speech market has changed tremendously over the last decade. Big companies now offer big voices: even so, it’s not impossible to get one of
the ScanSoft voices on a mobile phone or portable daisy player - files that were hundreds of megabytes only months ago downsized for the mobile market.
My problem with all these voices is that they are not 100% artificial - they use, in case you were unaware, segments of Human speech. Clever processing
rules and translations tell the engine which segments of speech should be played to simulate words, and thus we have a “human-like” voice.
This seems remarkably desirable for many people. I concede that in some industries it’s incontrovertibly useful - and for sighted people, the more human
the better. But from a very young age, I always liked the fact that my screen reader wasn’t a person. People don’t always read precisely what you tell
them to, in the way you want, pausing on queue and adjusting their pronunciation at your slightest whim. Despite a longstanding image of a little man sat
inside my computer, I never associated the speech I hear whenever I’m logged on to speech I hear elsewhere. The two are distinct, separate, and their only
relation is that I can understand them.
Even so, there are plenty of formant synthesizers in use today (i.e. ones not using Human samples). I would venture to say that Eloquence (or its IBM variants),
DECtalk, Doubletalk and E-speak are today’s leaders. The aforementioned nicely cover both hardware and software as well; although not the same (I’d love
an Eloquence microchip for instance)!
The question I have to ask myself then is why I’m so inured with Eloquence? It’s almost like a built-in prejudice, and it can be frustrating at times.
Firstly, I have used it more than perhaps all other synthesizers combined. Not only that, but I’ve used it for a wider variety of purposes: yes, all involve
written text, but the form of that text has varied from recipes to poetry; fiction novels to reference works. Be I relaxing with a book or chatting online,
Eloquence has been there pervasively.
There are downsides to this attachment, of course. Eloquence is old - tried and tested certainly, but there are issues that could be addressed (such as
certain combinations of letters crashing the entire synthesizer). Also, whilst it provides extraordinary powers of customisation as regards pronunciation,
that’s not always handed down by a screen reader. In JAWS, for instance, there’s no way to directly change how eloquence pronounces things itself. The
JAWS dictionary manager lets you shuffle letters around and try and make it speak the best you can, but the synthesizer itself allows direct phoneme entry
which would make things much easier. With that, you could not only change the pronunciation of a word, but its stress, etc - and for a heavy fiction reader
such as myself, some things niggle mercilessly.
Another problem is that I’ve been pampered too much by JAWS. Since the introduction of the speech and sounds manager, I read all my fiction with quotes
in a different voice. The voice I use is only a slight modification to the default (the pitch lowered slightly) but it is a useful distinguishing characteristic
for me. Unfortunately, due to licensing restrictions, Eloquence doesn’t make an appearance in NVDA or free screen readers.
So what’s the future of the synthesizer? Well, what do I know? I’m just an average user. But it seems to me that despite the undeniable popularity of the
more Human-sounding voices, Eloquence (and its formant brothers) has long life yet. As to the particular voice that is Eloquence, it’s hard to say. It’s
not only screen readers on the desktop - mobile phones and notetakers are carrying the torch as well. If it were to vanish, I am sure I wouldn’t be alone
in my sadness.
I do love the quirks it exhibits. The aplomb of the programmers astonishes me - the way in which Eloquence can say the word voyage so naturally and yet
utter “bon voyage” in such great, French style. It is a remarkably subtle synthesizer, with many a nuance (if you will pardon the pun). Yes, it mispronounces
many things. From Harry Potter alone there are terms like “Azkaban”, “Cruciatus”, “Firenze” and “Hermione”. The science-fiction world offers plenty more
- “Jedi” has an I sound at the end, and trek names of places and people (such as “nerys”, “Jem hadar”, “Dukat” and “Cardassia”) are all off. The power
of this synthesizer means that all of these (and countless others) are easily fixable, simply by substituting letters and showing a little creativity.
So: a powerful, impressive, long-lasted product. It’s weathered changes of ownership and used by thousands. There aren’t many things it can’t say properly,
given the right ASCII tickle. So a toast is in order, I suggest. To Eloquence. To Dr. Hertz and to all those other great people, to the JAWS developers
who bridged the technologies - and as I lift my metaphorical glass, I’d also like to thank these people. I’d never enjoy a good book in the same way if
it weren’t for you.
February 13th, 2009 at 12:47 am
Here here.