Angry Robot

VOICE

Two things recently. #1 was EndWar, a console real-time strategy game by Ubisoft that allowed for voice control of your units. It worked near-perfectly, and made a hell of a lot of sense. There have always been issues getting RTSes working on consoles because they are complicated PC games that are mouse-and-keyboard centered, and they tend not to work well on console controllers. EndWar just routed the fuck around that. After all, RTSes are games about barking orders at soldiers, and a mouse is a pretty arbitrary substitute for actual barking.

The second thing is the Google Mobile app on the iPhone. Despite it being developed by Quicksilver master Alcor, the first version wasn’t all that thrilling. The next major revision added voice search – hold the phone up to your ear, wait for the beep, and speak. However, I put it aside after a few failed queries. I’ve tried it again, though, and I think they’ve improved things on the server side because it actually works. It even managed to get “CRTC” right, which surprised me. Like the Shazam audio recognition app, it’s one of those head-turner iPhone features, but unlike Shazam, I actually use it all the time.

Both of these things are examples of using voice instead of keyboards in contexts where a keyboard is pretty sucky. No one wants to have a keyboard lying on their lap when they’re sitting on the couch, and similarly why mash tiny fake buttons on your iPhone screen when you can use your face to say things. Obviously they are also hugely dependent on audio pattern recognition algorithms and AI having advanced to a sufficient state. But what they signal is that they have indeed advanced, and we can now talk to our computers, and more often than not, they will actually understand.

On the other side, computers have gotten better at talking to us. Take the new Shuffle, released yesterday, which will tell you the song you’re listening to. Or, the Kindle 2, which will read your book to you. Sure, text-to-speech has been around a while; ask any Mac owner. But the new voice in Leopard beats the hell out of all the old ones, and judging by how computer technology has been progressing, I’d wager the voices will only get better.

I’m convinced that this means big things. If something can easily be turned into text from voice, that means that it can easily be searched. The new Google Voice will allow transcription and then tagging and searching of voicemails. Now imagine recording everything you say, and everything anyone says to you, and being able to search it. It’s not so far off – borderline achievable today with an iPhone, a 3G connection and something like Jott (unfortunately, the way-cool Jott has gone pay-only).

I can’t help but think about the storytelling potential for such technology, namely games. It’s not just bossing units around. Imagine no more dialogue trees and menus and simply engaging in natural language coversations with AI characters. It wouldn’t just be for RPGs; suddenly a game could exist where the central ‘gameplay’ is just conversation. Conversation is obviously central to human life and is something film, TV, novels, every other sequential art form can render in a manner befitting its medium, which is not true of games right now.

Finally, the shift is ultimately about the disembodiment of computers, paralleling the rise of cloud computing. Sure, we interact with plenty of disembodied computers already, like when we get up in the internets. But we do that through our desktops and laptops. As our computers get smaller – phones, pens, etc. and more ubiquitous, it’s increasingly archaic to interact with them only through screens and keyboards. They will become magic ghost butlers, like HAL (except hopefully less killy).

Next: a smell-based operating system.