Vocalizer Expressive

Oh boy, answering this one could require a medium-length book, but let’s see what I can do with it.

To start with, you are a normal person; situations that require you to press multiple keys simultaneously are not a big deal. Those same situations are anathema to me, and they are usually impossible for me to comply with. The infamous “three-finger salute,” Control-Alternate-Delete, is the classic example, but there are numerous others. Modern operating systems offer a way around this problem as part of what are loosely called “accessibility tools” (or features). The tool that allows me to handle multi-key combinations by pressing the keys in sequence is referred to as “sticky keys.”

Are you with me so far?

Modern operating systems are essentially layers of programming, each layer being responsible for a specific set of tasks that form a piece of whatever the user is having the computer do in general. The layered analogy is correct, because what a given layer is doing is directly determined by what the layer immediately below it has just passed along. If operating systems are layered cakes, it’s important that those accessibility tools are as close to the bottom layer as possible.

We are all painfully familiar with what happens when an operating system becomes unstable. Programs start acting squirrelly, and the computer may eventually lock up. IBM’s OS/2 was the only operating system I’ve ever encountered that put Sticky Keys down at the correct layer, at the very bottom against the hardware. That meant that, no matter what else nuked in the software, you always kept at least enough keyboard control to reboot. (And, a running OS/2 instance was generally as robust as the Terminator.) Windows can’t match that claim; Microsoft didn’t even begin building accessibility into the operating system until Win 95, and there are still some applications—mostly games—that blithely ignore the accessibility settings. Still, as a general rule, Windows is far more circumspect about my accessibility needs than I expect Linux ever will be.

The other consideration is simply what do I use computers for, and the most basic answer is: everything I do. Oh, I realize there’s a huge library of software available for Linux, but the bulk of it is Open Source. Proponents of Open Source are quick to point out that anybody can vet the source code. That is true, but how many of us have the time, energy, and qualifications to do it properly? Virtually nobody. Instead, we automatically assume someone else has done the drudgery of checking every line of code and cross our fingers. Let’s not forget that the various flavors of Linux are also Open Source projects, meaning that they are written and maintained by hobbyists. No doubt those people are well-intentioned, but we have plenty of experience with hobbyist efforts. Some are great, but most aren’t. Putting Linux on a computer to play with is fine, but very few are willing to commit tasks that affect their personal welfare to client machines running Linux. It’s a question of accountability, a Linux screw-up isn’t going to hurt its “sales.”

More thoughts on the uneven nature of Open Source projects can be heard on this week’s episode of the Security Now podcast. The hosts are usually quite bullish on Open Source, but they are compelled this week to acknowledge some weaknesses that are inherent to hobbyist efforts. This introspection was prompted by release of the first initial results of a professional security audit of portions of the source code for TrueCrypt, a very highly respected Open Source drive encryption system.

From: kb5ziv [mailto:kb5ziv@rionet.coop]
Sent: Thursday, April 17, 2014 20:45
To: royall@conchbbs.com
Subject: Re: Vocalizer Expressive

now i see why you are haveing to push for a new system to work with,hope they don’t stone wall you,have you looked around on lennox?,jim

Vocalizer Expressive

Stefan,

Well yes, just getting any response from you is something. Getting anyone at Nuance to respond at all has been an oncoming battle.

Seeing the source code for VEDEMO is logically the next step for me. I’ve made no secret of the fact that my own application, Xpress-It, was based on the demo app for Eloquence. Of course, Xpress-It quickly became much more intelligent, with adaptive word prediction and so on, but the UI was intentionally kept very simple to look at. The job of Xpress-It is literally to speak for me.

When will the source code for VEDEMO be available? That’s a question any project manager should be able to answer, at least within the vagaries of any project.

As for getting access to the SDK again, I seem to dimly recall that Rachel has something to do with sales. If so, I expect her hand to pop out momentarily for more money since I’m going to be moving from an “evaluator” to “developer.” Sigh, that is what it is. I seriously doubt I’ll ever sell a single copy of Xpress-It or whatever the new version is, but SDK developer companies tend to assume that application developers have plenty of money. Again, it is what it is. (Hi Rachel, J)

Stefan, you’ll learn in subsequent email more about who I am, and how I utilize Eloquence (and eventually Vocalizer Expressive). For now, I’ll close with this factoid. The reason why I’m phasing out Eloquence is simply because Nuance end-of-life’d it without ever releasing a 64-bit version. The Windows environment is increasingly 64 bit, to the point that there’s no more 32-bit ODBC, and Xpress-It is heavily ODBC. Yeah. Oh.

Scott

From: Hamerich, Stefan [mailto:Stefan.Hamerich@nuance.com]
Sent: Thursday, April 17, 2014 08:57
To: royall@conchbbs.com; Elias, Rachel
Cc: De Moortel, Jan
Subject: RE: Vocalizer Expressive

Hi,

Here is a hello from me then. J

VEDEMO is a vehicle to demonstrate what can be done with our TTS engine.

But we are going to provide the source code of it.

We do provide sample code and a documentation which do allow to implement all the functionalities.

Creating a UI around should not be too complicated.

Sorry, but nothing more I can do here

Best regards

Stefan

From: Scott Royall [mailto:royall]
Sent: Mittwoch, 16. April 2014 04:59
To: Elias, Rachel
Cc: De Moortel, Jan; Hamerich, Stefan
Subject: RE: Vocalizer Expressive

Well yeah, let’s see if they even say “hello” to me first! They can at least see what I’m generally doing and what direction I need to head.

From: Elias, Rachel [mailto:Rachel.Elias]
Sent: Tuesday, April 15, 2014 21:04
To: royall
Cc: De Moortel, Jan; Hamerich, Stefan
Subject: RE: Vocalizer Expressive

Hi Scott – I would like to introduce you to product managers for Vocalizer Expressive. Maybe they can help you? –Rachel

Vocalizer Expressive

Well yeah, let’s see if they even say “hello” to me first! They can at least see what I’m generally doing and what direction I need to head.

From: Elias, Rachel [mailto:Rachel.Elias@nuance.com]
Sent: Tuesday, April 15, 2014 21:04
To: royall@conchbbs.com
Cc: De Moortel, Jan; Hamerich, Stefan
Subject: RE: Vocalizer Expressive

Hi Scott – I would like to introduce you to product managers for Vocalizer Expressive. Maybe they can help you? –Rachel

Vocalizer Expressive

Rachel,

Let’s start with some positive news, shall we? On last Friday, I was able to finally create enough of a software lash-up to determine if Vocalizer Expressive would be viable for use on-air. While I wouldn’t have called the arrangement pretty, it was sufficient to prove the point. Yes, VE can be made to function over the radio pretty well. Some of the people listening found it quite intelligible, in fact. That’s good news because it says there’s hope that VE can do what Eloquence has done for years, and it is therefore worth working with.

Yes, that’s the good news. One of the guys did say it will need some EQ adjustments, not realizing how few controls are really available with VE. Speed and tone sliders are all there is. On my side though, I did notice some aspects that will really need to be addressed. Whether or not we can get a Nuance engineer to even give me the time of day is going to be the biggest issue. I know that Nuance’s position is going to be they are not interested in increasing their presence in the augmented and assisted communications market, and I certainly understand how incestuous it is. However, flexible high quality voice synthesis is critical to my daily living, and it is important that you and someone else in a position of some influence at Nuance begin to fully appreciate my need intellectually and emotionally. Nuance is currently more likely interested in simpler markets like being a component in automated phone systems. Places where flexibility isn’t a factor because everything you’ll ever need to synthesize is already known. That’s not my reality at all.

Naturally, companies exist to generate profit, and convincing them to revise products out of altruism is generally a non-starter. Yet, Nuance should start paying more attention to what’s going on, because voice synthesis of open-ended vocabularies is becoming more common. For example, several metropolitan fire departments, including Houston, are using it on their main fire and EMS channels, and what they’re using now would frighten you. The National Weather Service at least uses Eloquence or its equivalent on their transmitters. So there’s real money to be had in meeting the needs similar to mine.

One reason why Eloquence has been so damned hard to beat is because of Its history. Apologies if you already know this, but Eloquence was created by Cornell University for the DoD. It was very much a military project, and it was quite well aware of radio procedures. Eloquence knows the International Telegraphy Union phonetic alphabet, also used by hams and pilots. It understands how to recognize and pronounce ham callsigns. It even knows about arcana like “Q-signals.” VE has none of that. True, some of that can be migrated to my application, Xpress-It, but performance is also a major factor.

I cannot overstate the importance of minimizing latency, how quickly people tire of waiting for me to respond is scary. Of course I understand that using the highest resolution model, Premium Tom at 22KHz, constitutes the toughest test, but please understand that audio quality is also critical so I didn’t risk a lower resolution model.

Rachel, the reality is that I now really need a look at the source code for VEDEMO. I haven’t yet decided whether I am going to try replacing Eloquence with VE in the existing application or start fresh, and VEDEMO will help me choose. I’m also going to need access to the SDK again. That shouldn’t surprise you, as we aren’t in evaluation mode any longer. Back during the evaluation, I was scrupulous to not keep the installation package. However, we’re now entering the real-world of development, and I actually use two quite similar laptops. I understand that you’re not overly interested in my situation, but the bottom line is that Nuance has something I’m going to need and you’re my contact. If we can get someone in Engineering to care-or at least smell an opportunity for additional profits, I think we would all be happier.

Scott

Expressive Speech Engine

Rachel,

Might I possibly see the source code for VEDEMO? There are some things in it that I really need to see to fully understand. I may be able to tease the information out of the other examples, but VEDEMO is closest to my real-life scenario.

I am developing an overall impression of how Expressive and Eloquence compare in what is daily use for me. Of course I recognize that any observations and conclusions I may offer are largely moot, as Nuance has already decided that Expressive is the future. Still, I would think you might want to see the thoughts of someone whose life will be largely affected by how well your product performs. That may only be my hubris, however.

Scott Royall