Skip to content

Coding by Voice

Coding by voice command requires two kinds of software: a speech-recognition engine and a platform for voice coding. Dragon from Nuance, a speech-recognition software developer in Burlington, Massachusetts, is an advanced engine and is widely used for programming by voice. On the platform side, VoiceCode by Ben Meyer and Talon by Ryan Hileman (both are for Mac OS only) are popular.

Coding by voice platforms

Two platforms for voice programming are Caster and Aenea, the latter of which runs on Linux. Both are free and open source, and enable voice-programming functionality in Dragonfly, which is an open-source Python framework that links actions with voice commands detected by a speech-recognition engine. Saphra tried Dragonfly, but found that setup required more use of her hands than she could tolerate.

All of these platforms for voice command work independently of coding language and text editor, and so can also be used for tasks outside programming. Pimentel, for instance, uses voice recognition to write e-mails, which he finds easier, faster and more natural than typing.

To the untrained ear, coding by voice command sounds like staccato bursts of a secret language. Rudd’s video is full of terms like ‘slap’ (hit return), ‘sup’ (search up) and ‘mara’ (mark paragraph).

Unlike virtual personal assistants such as Apple’s Siri or Google’s Alexa, VoiceCode and Talon don’t do natural-language processing, so spoken instructions have to precisely match the commands that the system already knows. But both platforms use continuous command recognition, so users needn’t pause between commands, as Siri and Alexa require.

VoiceCode commands typically use words not in the English language, because if you use an English word as a command, such as ‘return’, it means you can never type out that word. By contrast, Talon, Aenea and Caster feature dynamic grammar, a tool that constantly updates which words the software can recognize on the basis of which applications are open. This means users can give English words as commands without causing confusion.

In addition to voice recognition, Talon can also replace a computer mouse with eye tracking, which requires a Tobii 4c eye tracker (US$150). Other eye-mousing systems generally require both the eye tracker and head-tracking hardware, such as the TrackIR from NaturalPoint. “I want to make fully hands-free use of every part of a desktop computer a thing,” says Hileman. Other mouse replacements also exist; Pimentel uses one called SmartNav.

Voice command requires at least a decent headset or microphone. Many users choose a unidirectional microphone so that others can talk to them while they are dictating code. One such mic, a cardioid mic, requires special equipment to supply power, and hardware costs can reach $400, says Pimentel.

The software can cost several hundred dollars too. The speech-recognition engine Dragon Professional costs $300, as does VoiceCode. Caster and Aenea are free and open source. Talon is available for free, but requires a separate speech-recognition engine. A beta version of Talon that includes a built-in speech-recognition engine is currently available to Hileman’s Patreon supporters for $15 per month. 

Whether or not users have RSI, it can be difficult and frustrating to start programming by voice. It took a month and a half for Pimentel to get up to speed, he says, and there were days when he was ready to throw in the towel. He printed out 40 pages of commands and forced himself to look at them until he learnt them. Saphra needed two months of coding, a little every day, before she felt that it was a “perfectly enjoyable experience and I could see myself doing this for a living”.

After the initial learning curve, users often create custom prompts for commonly used commands as the need arises. 

References