ShortTalk: Dictation Made Rewarding

Tutorial

ShortTalk is a collection of editing concepts that can be stringed together in “phrases.” A phrase is usually made of one or two, sometimes three concepts. Some concepts may stand alone, others may occur only as part of phrases. The resulting language is probably not much more complex than that which can be learned by some non-human primates, who compose phrases made out of signs for food, objects, and simple actions. We call ShortTalk an editing language although it is not a language in the sense that ordinary people or linguists use the term. In fact, we have taken the opposite “primate,” proto-language point of view since 1) there is such staggering evidence that humans are masters of sequencing formal symbols as when they tap away on their keyboards, play instruments, etc. and 2) there's no reason to believe that this ability does not translate into the spoken domain, just as you can type digits at about the same speed you can enounce them. And, evidently, humans can sequence words in real languages that are infinitely more complex than proto-languages.

It's about learning and learning is about rewards

Using your voice for controlling your computer effectively is a matter of training. This holds whatever are the syntactic clothes of the utterances, be they mnemonic words, stilted natural language or diffuse and hard-to-deliniate “real” natural language. The greatest motivation is obviously usability: efficiency, systematism, and simplicity. These three factors will be argued below as we introduce the main concepts of ShortTalk.

The forward and backward distinction

Editing actions a very often relative to position. For example, search or identification of nearby words or lines are always backwards or forwards relative to the current position. Thus direction is a primary piece of knowledge implicit in our cognition about editing situations. It would be a waste of effort not to systematically represent direction in editing concepts. The ShortTalk solution is extremely simple and terse: the vowel denotes direction. For example, “go aift hello” means place the cursor after the occurrence of “hello” following the current position; and “go ooft” means place the cursor after the occurrence of “hello” preceding the cursor. So, the distinction is: “oo” means backward and “ai” means forward. (Both are short vowels; after all this is ShortTalk.) This system applies to any concept where direction makes sense.

Actions that may stand alone

Pressing the space bar is “spooce” for half the syllabic effort of saying “space bar.” The same applies to “loon” for return, usually called “new line” in Natural Language Systems. It saves the user the offense of having dictation such as “the new line is that” misinterpreted. (And should the subject exceptionally be a certain aquatic bird, the user may use the phrasing “l-rall loon” to type loon.) Keys like up and left arrow have similar mnemonic names: go up becomes “goop”, that is “go oop” (“oo” sound for backwards motion alterates the vowel of “up”) and up arrow becomes “gloof” for “go left” in a similar manner. Then we derive “graif” for “go right” (not “grait” because it sounds the same as “great”). For a step down towards the netherworld, “go nether” becomes “gnaith”. This part that concerns the mapping of common symbols is the most foreign part of ShortTalk, although it indeed is conceptually trivial. It is about finding effective non-ambiguous names for, and and indispensable control keys. Fortunately, there are not that many of them.

Beginning/end

How to go to the beginning or end of something which you are at? That's easy: “ghin” for “beginning” and “ex” for exit or ending. Thus, to go to the beginning of the word, say “ghin word.” to go to the end of the paragraph, say “ex para.”

Numbers

Scottish “ane” is for one, “twain” for two, “traio” for three, “fairn” for four, and “faif” for five. It stops here because it seems that the eye can quickly identify only four or five items. Commercial systems offers commands like “move cursor down 17 lines”. In ShortTalk, you would say “line faif”, then “goink”, for repeating the last command, and then one more “goink” to end up very close to the destination. Then you'll be able to immediately see that “line twain” will bring you to where you want to be. The point is that you did not all along want to know that the precise count is 17. ShortTalk is about getting you work done, not about implementing voice commands that may be “natural” but barely usable.

Often a number is used where direction makes sense: if “line faif” means “go down five lines,” then does “line foof” means go “up lines”? Yes! So, now we have ten useful and efficient numerals that eliminate the offensive guesswork about the meaning of “to”, “2”, “two”,... The numerals are also crucial to the disambiguation of commands from dictation. In fact, they can never appear by themselves. That is why “line twain” can be embedded in continuous dictation. And, by the way, “Mark Twain” still comes out as “Mark Twain” because there is no ShortTalk concept named “mark.”

Characters, words, lines, paragraphs,...

Structural concepts for various kinds of pieces of text are all there: “char” (as in charcoal) for characters, “word” for words, “line” for lines and “para” for paragraphs. A word with hyphens is “eed” (for identifier). A “ting” is a thing that is any stretch of characters that are not spaces (useful for email addresses), a “tier” is the line without the newline character, an “inner” is everything inside of quotes or parenthesis, “term” is a quotation or the whole parenthesized expression, and a “senten” is a sentence. There are a couple of more such structure concepts, and together they cover most imaginable characterizations of pieces of text, whether in technical writing or programming. Now combine them with numerals and you have already have a very powerful set of tools for just moving the cursor.

For example, “ting ane” skips over all whitespace to put the cursor at the first visible character (letter, parenthesis, whatever) after the cursor. The command “word twoon” puts the cursor at the second word before the current word. Now we got 10 numerals times 10 concepts for moving the cursor locally. Note that the effort is pretty minimal: the ai/oo principle, the numerals “ane”, “twain”, ..., “faif”, and ten mostly obvious and known terms for pieces of text yield a hundred commands. Human affinity for combining symbols means that the utility of this little grammar is exponential over time: hesitancy is soon replaced by “automatic” utterances that reflect your intentions. Contrast this situation to Natural Language Technology where you will struggle with questions such “is it 'move right' or 'go right'?” and with persistent misrecognitions of your intentions as to whether you meant dictation or commands (because of the forced pauses that most be inserted beween commands).

Common places

So when you say “this paragraph” with commercial systems does it refer to the paragraph where the mouse pointer is or where the text cursor is? ShortTalk rejects such ambiguity for human reasons: no user should tolerate the whims and moods of programmers who try to interpret natural languages. So again there is a simple system at work: “hare” is here for where the cursor is and “tair” is “there” for where the pointer is. But in ShortTalk there are even more useful positional concepts not made available in most editors. The reason for the relative poverty of editing by keys is simple: there are not enough keys on the keyboard to express and conveniently even if the concepts are latent in our perception of editing.

For example, ShortTalk keeps track of where the cursor was before the last cursor excursion. So, if you begin moving the cursor around after typing something, this position called “mairk” marks the end of what you typed even as the cursor is no longer there. Mairk is denoted visually (by a brown square). This position is really useful—indeed it is a part of our “where was I?” reasoning about editing. To go to the mairk, say “gairk” for “go to mairk.” To insert a space at mairk say “spooce lairk” and to capitalize the word at mairk say “caip lairk.” The concept of mairk is borrowed from the Emacs text editor; in Emacs however, mairk is expressed in only a couple of composite commands that are bound to seemingly random keys.

Another essential concept is that of the last position where something changed: it is often the position at the start of the last inserted text. Often you forget a space at that place, or maybe the capitalization is wrong. This position is called “loost.” Naturally, one goes to “loost”, which is marked green, by simply saying “goost.”

Actions

Actions have concise mnemonics: to capitalize is “caip,” to uppercase is “aipper”, to fix spacing and capitalization (after e.g. “.”) is “fix”, to simply insert a space is spooce,” etc. When editing, we combine actions and places as the situation calls for. For example, after we dictated “we helped it going” into existing text and the screen now displays “the most we had.we helped it going|” with the cursor “|” now being at the end, we say “fix loost” to repair “we” right after the period. This operation does not move the cursor.

Compare this to reaching for the mouse, moving it to locate the period, then clicking it, then find the keyboard again to insert spaces, delete the wrongly-cased letter, inserting the uppercased one, then reaching for the mouse again to reposition the cursor... This example indicates why ShortTalk is much faster than traditional mechanical interfaces in many common situations.

Grabbing and smacking

ShortTalk integrates mouse and cursor positions in commands that greatly amplify the power of a pointing device. For example, the command “grab ting” copies the e-mail address at the mouse pointer to where the cursor is. Thus, to insert an e-mail address in the middle of the text, you can say “please write to grab ting as soon as possible” (without any pauses) while your hand at the same time pushes the mouse so that it is placed somewhere over the e-mail address.

In order to delete something, you use the concept “smack.” So, smack senten” deletes the sentence where the cursor is. And, “smack senten tair” deletes the sentence where the mouse pointer is. If in addition you want to move the cursor to where deletion happens, then you say “smack senten gook”. To delete something while copying it to the keyboard, you use “rem” for “remove.” So, “rem twoon” removes the word at the cursor and the one preceeding it. We just illustrated another ShortTalk principle: concepts can be omitted as long as the resulting phrase is not something that is part of the natural language. There are always appropriate defaults. In this case, “rem twoon” means “rem eed twoon.”

Searching for stuff

Again the principles are very simple:“baif” or “boof” identify the position before words to look for and “aift” or “ooft” identify positions after. The vowels determine the search direction. So, above we might also have said “fix boof we” to fix the problem around the period. Of course, if we just wanted to insert a white space at this position, we would just put the “spooce” word together with “boof we”: “spooce boof we” does the job. Generally, you can easily fix capitalization and spacing issues in a second or two in this way without using mouse or keyboard. Because it is so much more efficient, these commands become ingrained quickly.

Symbols

You do not need to learn the shorthand for for symbols, but some will be so convenient that you may long for them. For example, the ShortTalk name for “!” is “clam” (as in “exCLAMation mark”). So, “clam traio” inserts three exclamation marks. (Instead of “exclamation mark, exclamation mark, exclamation mark”.)

What is more?

We have already covered all the essential aspects of ShortTalk. There is more of course: window manipulation, insertion of markup, and formatting commands. In the sidebar, you'll find a link to a complete overview of the ShortTalk syntax.