Hey Google, what will it mean to be human?
By now, I hope you've seen Google's demo of its voice assistant making calls on its user's behalf. Emotive voice AI is still in its infancy, but Google put on display the potential for its products to replace humans in daily interactions. What I found most interesting, though, was that this demo demonstrated the significant gap still to go in voice AI's mimicry of humanity. This has me thinking about the ramification of what's to come.
While Google Assistant successfully maneuvers an attempt to book a reservation at a restaurant (linked above), it does so fairly awkwardly - it perceives nuance and misunderstanding in the conversation, but can't fully emote to save face as a real human would, or seem flustered in a human way. It's not awkward enough to be a person. It pushes on like a good soldier in the interaction, seeking a satisfying end to the encounter for its representative user.
The example of the restaurant reservation is telling. Google CEO Sundar Pichai frames this demo in terms of value added for businesses: "We’re still developing this technology and we actually want to work hard to get this right, get the user experience and the expectation right for both businesses and users. But done correctly, it will save time for people and generate a lot of value for businesses." In this example, though, the 'value' very clearly is to the user who deployed Google Assistant on his behalf. You, the user, don't have to have that conversation.
Pichai described this call as going "a bit differently than expected." What he and Google can't say overtly is that the conversation was challenging for the AI because, quite simply, it's with an ethnically undefined - presumably Asian - woman, whose accent and presumable ESL status make the conversation difficult for Google's white bread AI (representing Google's composite user, and that whom would be most excited about this technological advancement), which handles the shifts in conversation as deftly as a teenager learning to drive stick.
The AI forges ahead, brushing off confusion with a measured patience and indifference that isn't human insofar as it's not how your typical white American (the Assistant's voice, and Google's primary audience, even at Google I/O) would feel in the moment. Can you picture the reactions and sentiments? Flustered, anxious, even angry: from far-righters seething for feeling undeservedly embarrassed and out of place because this immigrant can't speak English to their expected standards, to progressives tripping over themselves for failing to live up to their own progressive standards and potentially making the restauranteur feel undeservedly foreign.
Alas, no. Google's AI is emotionally crotchless, like a Ken Doll. Plastic, human in imitation but distinctly and recognizably other. And that's what gets me: that otherness. For now, it's a technical shortcoming. Google Assistant can't perfectly mimic a human caller and interact in an indistinguishably human fashion. But will that always be the case?
Pichui suggests that Google's intention is to achieve mimicry: "Our vision for the perfect assistant is that it’s naturally conversational; it’s there when you need it so that you can get things done in the real world. And we’re working to make it even better. We want the assistant to be something that’s natural and comfortable to talk to."
To me, that's telling. We'll see where Google draws the line, but my sense is that their intention is to fully mimic a human interactional voice experience with its AI. But this opens a line of questions that has tremendous ramifications for how we interact with technology, and where the human/AI divide becomes either situationally insignificant or all-consuming. And that matters a lot. How much do we want to bridge that gap? How human is enough for voice AIs and personified, emoting technology? What are the design and cultural ramifications of maintaining a distinction between AI & human, or on developing emotive expression enough that AI & humans become verbally indistinguishable? Most importantly: what are the human consequences? I don't know, but I'd like to speculate.
Let's get back to the demo: the audience marvels, murmurs, and cheers in reaction. They also breathe a collective sigh of relief, and that's exactly why Google picked this scenario to demonstrate the technology's abilities. In the near future, we won't have to endure the potential for uncomfortable anxieties that come with situational incongruence like misunderstandings, misreading cues, not hearing someone clearly so laughing politely to ameliorate the situation, and all the other misgivings that can come with awkward conversations and having to save face. Failing to comprehend, and being awkward. Responding to, "Can I help you?" with, "I'm fine." before you've had your morning coffee, or having to ask the person on the other end several times to repeat what they said and still failing to understand.
That sounds great and all, but what about the other end of the call? Let's assume that this technology achieves moderately widespread adoption at some point in the relatively near future. From this scenario, you have the potential for a swathe of people (early adopter technocrats, introverts, and the shy, at the very least) to opt out of an entire suite of situations where they'd speak to service workers. You're effectively ending a moment of human interaction. Hear me out, this is important.
On the flipside, you're subjecting a class of workers, at the least, to frequent interactions with human-like computers instead of humans. What are the ramifications of that? How dehumanizing is it to replace interactions with people with interactions with AI, because those people would prefer not to interact with you? It seems profoundly dehumanizing.
But maybe that's preferred? Perhaps that says an awful lot about how awful people are. The lack of "I'd like to speak to your manager" Carolbots would probably be a welcome change of pace, at least for a bit. What happens when we remove instances and opportunities for social interactions? I can imagine that many folks who work in the service industry would prefer to interact with AIs than people (because people are collectively awful). But for how long? What signficance does this bear? To be of the AI-interaction class? A swathe of people tasked to interact and interface with robo-voices because the people whom they service are too busy or too uninterested to talk to them.
And that even assumes that AIs and humans are distinguishable, which we know isn't Google's goal. What happens when they aren't? We advance artificial cognition and emotive expression to the point that they become every bit as awkward, weird, and unpleasant as real people? Is there room or a desire for human idiosyncrasies in personified AI? How human is too human? Will there be backlash against AIs and their representative users? There are serious ethical considerations for designing personified AIs that act distinguishably human-like vs. indistinguishably human.
Perhaps 100% human mimicry is unachievable, or there's a logical collective endpoint. I don't think that there's a benefit to creating AIs that have, no matter the minute chance, the possibility to engage in a racist tirade during an interaction. There's no room for a Michael Richards rant in the algorithm. So maybe it's about designing for interactions that are human in the micro: with a banded set of potential responses and interactional capabilities that doesn't include more edge, extreme reactions. Still, I'm wary of denying or subjecting categories of people to interactions with AI. I'm sensing a crack about to develop in what it is to be human.
Perhaps that's just as well, and we're due for a reset of what it means to be appropriately human...but that's also why I'm concerned. I think I'm most concerned about what happens when you remove entire categories of situational interactions from existence because we don't need to intensify the echo chambers and bubbles any more than they already. There's the Mark Twain quote about travel
that I think applies just as well to simple human interaction. Interaction is key to what it is to be human. Interaction is good. Subbing in nonhumans to mimic humans in interaction will be weird, and have weird ramifications. There. That's my thesis. This isn't a pearl clutching, slippery slope argument so much as my assertion that those driving this AI development are about to challenge a core tenet of what it means to be human, whether they like it (or care) or not.
And that doesn't even cover the topic I'm most fascinated by on personified AI! Want to talk about the future it holds for children, and how it will redefine the human/computer power dynamic? Let's talk about voice natives. I'm contributing a section on this topic for a white paper that the agency I work for is producing on voice-first technology, so I won't scoop them (or myself). But I'm curious to see what happens with voice natives - children who will grow up never having not asked voice assistants and personified AIs to the answers to their questions. This orientation to technology is key. Even digital natives imagine themselves in control of technology: it's a tool; it's subordinate to us. Technology makes our lives better, easier. But voice assistants and AI alter that dynamic. It's not using the computer as a tool to find the answer, it's verbally asking a personified technology for an answer it automatically knows. The technology is dominant in that position, which has an interesting effect on perception of power and intelligence. Pause. It's not like we aren't already dominated by technology. But it's at least part of our ingrained social and cultural imaginations otherwise.
Early research on kids and personified technologies point to what's to come. The aptly and adorably named, "Hey Google, is it okay if I eat you?" study of children's interactions with these AIs by MIT belies the already murky relationship and understanding that children have with voice assistants. One key piece of the research focuses on children's perceptions of these AIs' intelligence:
That feels like a dangerous notion to imprint in children, and a fallacy that comes with personified AI's seeming sentience. To children, these personal assistants are a sort of interstitial humanity, but one that may soon always have the right answer and "know" more than they do. That interstitiality lies in my favorite (and sneakily horrifying) detail in the study and the paper's title.
The child sensed that this Google voice entity was not human yet not classically inanimate, and subjected it to a 6 year old's Turing test: Are you edible? to probe its capacity for being. And for the absurd, as only a 6 year old can.
Google isn't unaware of the implications that come with children's interactions with this new mode of intelligence:
But I think there's a bigger concern: are kids learning to be subordinate to technology when they can just say 'Hey Google' to ask for anything they need?
Hey Google, what will it mean to be human?