ShortTalk: Dictation Made Rewarding


Our work shows that editing by voice can be made significantly more efficient than possible with current dictation systems. In fact, analyses of editing situations and empirical measurements indicate that editing by speech carries the potential of beating the keyboard and mouse in efficiency.

Our assumption that editing by speech demands a substantial learning effort is contrary to conventional wisdom about the role of speech recognition. Editing is so complicated that innate naturalness of the user interface does not exist in our opinion. The rational approach is to let efficiency, the amount of editing information that can be transmitted per second, drive the development of a spoken interface. For the user, efficiency is the strongest motivation for learning the complex tool any unfamiliar command language is. And we argued that natural language, being verbose, ambiguous, and impoverished for the task, may be a poor underpinning for such a tool (even if it could be understood by a very intelligent machine).

Our perspective and results demonstrate that the natural match between human and machine may be the one that recognizes the superiority of the human mind over computational capabilities of machines. Consequently, the potential of speech recognition is dramatically amplified by abandoning the use of natural language for commands. (A statement that does not in any way contradict the importance or usefulnes of natural language understanding for help systems and for interactive applications.)

The design of the keyboard in the 19th century was not derailed due to the existence of false analogies with “natural” human activities. But speech recognition for editing may have been fundamentally misunderstood thanks to the tantalizing, but for this purpose fruitless, idea that computers may understand human language. Our perspective also brings to front some other general issues about human cognition and linguistic performance:

  • For a simple set of commands, how much training is needed to achieve the same performance when spoken as when activated by keys?
  • For a given set of commands, do mnemonic and very short utterances shorten learning time over more verbose natural language formulations? Or, to which extent is some pre-existing, presumed communicative significance of a phrase (such as “page up”) important to acquiring the reflex that employs the phrase for achieving a particular goal?
  • How many concepts in a command language can be assimilated by average computer users when the physical limitations of expressing them on a keyboard are removed?