ShortTalk: Dictation Made Rewarding

FAQ

Is ShortTalk easy to learn?

There are no studies of the acquisition of command languages sophisticated enough to replace the keyboard. ShortTalk rewards the beginner through its superior efficiency—several times that of natural language technology of commercially available systems. Most command names are whimsical and easy to remember. Our hypothesis is that the strong reinforcement provided by ShortTalk makes it easier for the brain to adopt to the communicative significance of the command language syntax. In other words, the alternative of using natural language for carrying the chosen editing concepts, with the ensuing syntactic verbosity, blandness of command phrases, and mode confusion, likely results in a less learnable command language.

Will ShortTalk be valuable for input on tablets?

Yes. With ShortTalk, a few keys will still be necessary for some repetitive tasks that cannot be accomplished effectively by speech recognition. But in the main, editing is very efficiently accomplished by speech alone, complemented with some pointing.

I always felt that the computer human interfaces were about adapting the computer to the way the human worked, not the other way around. So your approach that rejects natural language must be misguided?

This argument is instinctively put forward by many people, including researchers in Human-Computer Interfaces. Indeed, it is a valid one for many applications. But applied to the activity of editing, the argument speciously assumes that our language is inherently so meaningful that it is an effective substitute for skills acquired by adaptation. Sadly there are no such miracles, and natural language may in fact be a barrier to skill development because of its inefficiency and vagueness. The use of the keyboard requires extensive training—and this adaptation is unavoidable. There is no inherently “natural” keyboard design requiring no training, as well as there is no natural way of creating alphabets and writing systems. Why would the complex task of editing not require significant human adaptation whatever the means of communication is—be it typing Control-C for “copy current selection into clipboard” or equivalently saying “<pause> copy that <pause>” (natural language technology) or saying “copy tat” with no pauses (ShortTalk)?

That natural language would be well-suited for the task of editing is understandable, but wishful thinking—given the verbosity and vagueness of commands in natural language. And by the way, how effective is natural language when you sit next to somebody who is editing text on a computer; do you succinctly, fluently, and precisely get you editing suggestions across? Or do you stumble for words, say “no, no, not there”, gesture, and point your ideas across?

ShortTalk is in fact precisely aimed at the way the humans like to work: with as little effort as at all possible!

Still, ShortTalk sounds weird; there must be easier ways to edit by speech?

I strongly believe that any usable command language must be constructed according to the principles of ShortTalk. Such a language is characterized by easiness in the following sense: it solves almost any editing situation in a very few words. A more verbose language would be ineffective. Most users would resist learning an ineffective tool. And, yes, ShortTalk sounds weird. But it should, otherwise it would not solve the mode problem (separating dictation from commands). ShortTalk allows the user to fluently mix dictation and commands—commercial systems with their natural language approach do not.

Does ShortTalk rely on the use of writing macros?

Professional-grade dictation systems offer programming facilities known as macros. The ShortTalk philosophy is to offer a complete solution from the outset, where the user is not forced to develop patches for an inherently insufficient command and control system. However, whenever an editing situation calls for the repetition of a sequence of commands, ShortTalk allows for the easy recording and play back. (EmacsListen itself offers a context-free grammar format that allows s-expressions to be bound to syntactic categories of the command grammar.)

Is ShortTalk available?

Yes. Carnegie Mellon University has accepted a donation from AT&T Labs, which comprises ShortTalk and the EmacsListen prototype. However, the current implementation only works with GNU Emacs, a text editor for professional programmers and other professionals.

Could ShortTalk be connected to other speech engines?

Yes, that should be relatively straightforward.

Why did it take six years to develop ShortTalk?

It was not obvious to me that using speech recognition for editing was even a feasible task. In fact, I believed the opposite the first four years. I did not know that editing by voice could become a fluent, automatic activity once a systematic conceptual framework had been formulated.

ShortTalk is a renegade approach that ignores established research in speech user interfaces. Doesn't it deserve universal rejection and condemnation?

Virtually all research in spoken computer interfaces concerns non-expert applications: call processing, dialogue systems, and multimodal interfaces for portable devices. The use of natural language is essential in these areas (although divergent views have been proposed such as the Universal Speech Interface, promoted by Roni Rosenfeld and his collaborators at CMU). ShortTalk addresses an entirely different scenario and is therefore not at odds with most established research. Editing is a complex domain that innately requires considerable skill and training. Our philosophy and results probably have no bearing on traditional speech user interfaces, and vice versa.

The idea of using syllables to encode concepts is a weak one. Are single syllables not more difficult to recognize than polysyllabic words?

There are between 15,000 and 30,000 different syllables in English. By using unusual syllables, or even foreign syllables, that are phonetically distinct from common ones, superior accuracy can be achieved. Not only are words like “sorch” for “search” easily distinguished from real words, they are easy to remember as any four-year-old knows from listening to the enticingly strange, but meaningful universe of Dr. Seuss.

Speech recognition in the office will never make it because of privacy concerns.

This is a real issue. Standard cubicle environments are not conducive to the use of speech recognition for dictation of sensitive documents. For people affected by cumulative traumHa disorders, the employer should, in my opinion, be obliged to offer a private office or a better insulated cubicle.

Interestingly, the use of ShortTalk itself presents much less of a problem: very little information about the document is revealed through the spoken commands. And, since much keyboard work, such as programming, mostly involves editing and repetitive tasks, the use of ShortTalk may still be a significant part of reducing the strain of using a computer.

Talking to your computer all day will ruin your voice?

The use of dictation systems has been associated with voice strain according to anecdotal evidence. Early dictation system users complained about the strain of disjointed speech resulting from the need to separate each word from the next by a small pause. The informal consensus seems to be that the modern systems that transcripe continuous speech are less stressfull. For the command and control part, modern dictation systems still require pauses, a deficiency that has been solved by ShortTalk. A CNN article “Is voice recognition dangerous for your health? article” discusses the problem.

Contents

Executive Summary

Introduction

Tutorial

Audio Demos

Video Demos

Experimental Evidence

Related Work

Conclusion

FAQ

Acknowledgments