Conversational Interfaces: the Good, the Ugly & the Billion-Dollar Opportunity
The five biggest problems with today's conversational chatbot design
i.
ChatGPT launched from OpenAI on November 30, 2022, and in two short months, broke consumer app records like a bull in a china shop. It felt to many like magic.
But what was really so magical about ChatGPT? Certainly much has been said about its intelligence, aka the quality of its answers. It performs at the level of a high schooler college student now junior employee! It generates an endless stream of bedtime stories and marketing copy! It offers a sympathetic ear.
But the true magic of ChatGPT as a consumer product goes beyond its encyclopedic knowledge.
Its interface — a conversational text box that’s instantly usable by the entire world — quietly ushered in the next era of user experience design.
Alas, the conversational interface is also where AI design innovation is stuck today. Let us deconstruct and examine closely its strengths and weaknesses.
ii.
When I was a kid, I ate novels for breakfast, lunch and dinner. But the absolute worst thing you could have given me to read was an instruction manual on how to program a VCR or write a BASIC function.
Now, mind you, I was plenty technical. As far as my parents could tell, I was an electronics wizard who spoke incantations into our devices. Presto! My mom would get her daily fixing of soap operas, and my dad his cherished vacation video footage.
But how I accomplished this was puzzling over the device and trying random things. If that didn’t work, I’d peer at examples. The technical manual fell to glazed eyes; it might as well have been gobbledy-gook.
Years later, I felt the same haze when I was introduced to linear algebra. I frowned and toiled, my mind trying valiantly to do mental gymnastics. It wasn’t that the concepts were too difficult, it was something about the representation of letters and symbols and jargon. A professor one day explained state machines by drawing a series of diagrams with arrows, and suddenly my mind lit up. What do you know, I was a visual learner!
Once I learned this, life became easier. I began to see equations as scales, trees as branching arrows, ideas like clusters of constellations in the sky. Ikea manuals are my jam. Legos are my spirit toy.
Our brains are quirky, distinct things. My partner learns better through his ears than his eyes. My co-founder is ruthless about stripping complex topics down to essential math equations. I love myself a good metaphor, but one colleague finds all my best attempts confusing.
The bittersweet reality is that we humans are desperate to express ourselves and be understood, yet we speak different languages even if we share a native tongue. Too bad we haven’t invented mind-reading yet.
The next best thing? Effective translation.
Effective translation is the designer’s Holy Grail.
The entire design discipline can be distilled into the craft of translating a creator’s intent into a user experience that fulfills the desired intent.
This is precisely where AI technology shines.
iii.
What makes for an exceptional user interface?
The most powerful rule of thumb is merely this: it feels obvious to use.
Something obvious needs no explanation. It’s as if the instruction manual has already been downloaded into the user’s brain.
And in a way it has, because the most obvious user interfaces are the ones that tap into what users already know.
When tabs for navigation were first introduced, they felt obvious to use because they mimicked physical file folders.
Digital buttons like the below feel obvious to click because they’re rendered with depth and shadow, like a real physical button.
The iPhone is more obvious to use than previous generations of phones (with up/down buttons or click wheels) and computers (with mouse inputs) because we humans are used to direct manipulation following the laws of real-world physics.
And the AI chat box interface, popularized by ChatGPT, feels obvious to use because it banks on two interactions every digital citizen already knows:
Conversing in natural language — something we’ve practiced since we were two years old.
Using an SMS / messaging interface — 25 billion texts are sent a day. A DAY. So yeah, this pattern is already well ingrained in our minds.
Sometimes designers will go to great lengths to make a user interface novel, or minimal, or simple. This is the wrong goal. Novelty, minimalism or simplicity are good heuristics for obviousness, but do not mistake the means for the end.
The technology powering ChatGPT had already launched prior to November 30th, 2022. It was just wrapped in a different interface. Very few people paid attention.
Why did ChatGPT explode in popularity? Because of its obvious chat interface, which everyone intuitively already knew how to use it.
iv.
"If you have a hammer, everything looks like a nail."
The problem with any innovation that succeeds, of course, is that it lights up a neon sign (“HEY THIS WORKS!”) that quickly grows into a neon building that then explodes into an entire Vegas Strip beckoning every dream-chaser to fast follow.
The bandwagon becomes a caravan. Our brains become like poor data models that overfit: Conversational chat interfaces for design! Conversational chat interfaces for images and videos! Conversational interfaces for coding! For news! For games!
There are 5 major problem areas today with conversational chat interfaces. Let’s examine each closely.
1. The blank page problem
Gartner famously noted when it comes to data that “80% of the value is in asking the right questions.”
Even before the era of ChatGPT, the Internet was chock full of ways to get answers.
YouTube, Khan Academy, Coursera, and Wikipedia offer free knowledge in nearly every subject. Yet average adult learning has barely increased — OECD surveys show that less than 10% of adults in many developed countries engage in any structured learning annually.
The problem with a blank chat box on a blank page is that it violates the first rule of a high-quality user experience: it isn’t obvious what I can do.
A blank page box puts the burden on the user to learn what to use it for.
This is not a problem for early adopters and high agency people (aka everyone reading this), who love the game of exploration and thrill of discovery.
But for the broader world, a blank page box is intimidating. A blank page box is lazy.
A blank page box reminds the user of Google Search, which famously designed itself to be an efficient router to other destinations. The founders felt people should spend as little time on Google Search as possible, which I doubt is the goal of today’s AI companies.
The user experience of a conversational chat interface should help people understand how to get the most out of it.
Where are the templates for key use cases? Where are “trending prompts” and highlighted examples so users can learn from the community? Where are suggestions to continue past conversations?
Today, Twitter operates as the user manual for how to use AI services; there is enormous opportunity for the services themselves to seize the opportunity to grow engagement and discovery.
If the success of social media platforms has taught us anything, it’s that starting with something to react to works far better than showing a blank page.
2. The iteration problem
You know what’s awesome? Asking an AI agent to create a raccoon battle game and getting something that works in <10 minutes.
You know what sucks? Trying to refine that raccoon game to match the vision in your head.
If you want to swap out that raccoon image with something that looks cuter, or try out different game titles (would “Raccoon Rodeo” or “Battle of the Bandits” look better on the title screen?) or experiment with whether the game should start with character or battlefield selection, a conversational interface is extremely cumbersome.
Nothing wonderful comes out fully formed. A good creator’s journey is one through the dark ravines and jungles of refinement. Conversational UIs are great at quickly getting to the first 70%, but suck at offering easy controls over narrower areas for iteration.
If I want to play with different titles for my game, for example, the obvious thing I’d want to do is select the title and start typing out my various ideas, or click on the raccoon character and swap out the image with different ones.
Typing instructions to my agent “Can you change the title from X to Y?” or “Can you change the raccoon to be cuter?” and then waiting while they execute the change is a frustratingly slow experience. (Not to mention—sometimes they change other elements of the game I didn’t even intend!)
There’s a reason we’ve invented WYSIWG buttons and selectors and inputs. Let’s not throw that away. Sometimes, it’s faster to click on a button and rapidly change the button radius from 10 to 12 to 16 just to see how it’ll look.
I’m heartened to see editable canvases with documents and code emerging as a pattern (although I do hate “modes”). I expect we’ll see more adoption of AI-gives-multiple-variations for creative refinement.
There’s plenty more opportunity for conversational UIs to get married to traditional UIs and make beautiful babies that enable faster iteration across the full gamut of creative exploration.
3. The input-output problem
Text is an awesome medium because using the written language is obvious to many, and because there’s thousands of years of prior art invested into making language rich and expressive and clear.
Text is a limited medium because typing and tapping sucks. And a picture is worth a thousand words. And I’ll know it when I see it was the best a Supreme Court justice could come up with to define obscenity.
It’s faster for humans to speak instructions than to type them. It’s faster for eyes to scan a response than to listen to a voice read the same content.
And yet, the status quo is to assume that input and output modes should be the same. This assumption holds if I’m driving, or there’s multiple people in a room, and I want voice in and out. But if I’m doing solo productive work (which is most of the time), why not default me to the more efficient input / output mode?
If AI services can understand the user’s intent, it becomes even more obvious what the ideal input and output modes should be.
Want to get a team aligned on what we should build? Skip the PRDs and essays and go straight to prototype.
Want to help a user redecorate their room? Let the user express what they want via moodboards rather than sentences.
Want to cheer a user up? A warm voice does far more to show support and connection than words on a screen.
Before we slap a conversational interface on everything, let’s ask ourselves: what are we trying to accomplish? And what is the most obvious method of input and output to achieve that goal?
4. The scoping problem
Ever had an oblivious friend? Someone who is well-meaning, but has an opinion on everything and has no real clue what they’re actually good or bad at?
Ask them about politics and they’ll confidently rattle off what really needs to be done. Complain about a problem and they’ll tell you how to fix it.
Wise people know their scope. They can predict with great accuracy what they know better than you, and what they’re ignorant about.
Conversational AI today feels more oblivious than wise; it doesn’t know what it doesn’t know.
When you ask an AI agent to produce something outside its zone of competence, it doesn’t inform the user: “This is beyond my current capabilities.” After repeated attempts and corrections, the agent does not suggest: “Let’s take a step back and try a different approach; this one doesn’t seem to be working.”
The AI doesn’t know whether it’s 90% or 60% or 20% confident, it doesn’t yet analyze the trustworthiness of its sources, it doesn’t fess up: “I’m not sure. My opinion leans X because…. but I’m uncertain about....”
The best-performing human teams operate with clarity on one another’s scope. Because of my engineering strength, I’m the best person to select which database we should use. Ask me a question about the design of the registration page, and I’ll refer you to the designer.
In the human world, we know that good feedback mechanisms not only improve an individual’s work, they also reflect back to that person what their strengths and weaknesses are. With that knowledge, the person can do a better job of selecting the appropriate scope for their work and asking for help when a task is beyond their level.
One day, if AI gets to superintelligence, there will be no topic where it remains more ignorant than us puny humans. But that day isn’t here yet.
Before then, why not design for awareness and clarity of scope? After all, authenticity breeds trust.
5. The personalization problem Golden Opportunity
This is the one I get giddy over because it’s the next step function change for user experience.
“I contain multitudes,” noted Walt Whitman, and its obvious truth strikes a chord. All of us present different facets of ourselves in different contexts—we don’t greet our partners the same way we greet our cousins or colleagues or strangers on the street.
Human interactions are slippery, dynamic things, the alchemy of not just each participant’s unique brain but also the group’s shared history and the moment’s specific context.
The tech world has already embraced personalization, having built an era on recommendation engines from Tiktok’s “For You” to Instagram’s (eerily) relevant ads to Netflix’s “Up Next.”
The next era of personalization moves beyond what content we show to how we shape its presentation.
If a service knows that I am a visual learner, who likes metaphors, who prefers an honest-no-bull-shit style, then translate the message to best connect with my brain.
If I’m asking for an explanation of fluid dynamics, give me an interactive diagram. If I’m struggling through a decision, relate it to a plot point from my favorite movie. If I’m asking for an essay critique, for the love of god do not give me a compliment sandwich.
Get to know me! Check in with me on how I like things! Ask me questions not because the algorithm is optimizing for more of my time spent, but because it’s trying to get smarter about delivering me a superior experience.
One particular area of low-hanging fruit that I’m surprised hasn’t gotten more attention is AI-assisted onboarding. I’m not talking about heavy-handed wizards that everyone races to skip past; I’m imagining the feeling of a great first meeting with a new team member.
Sure, an AI service can quietly observe a user’s reactions and behavior over time to learn how to personalize—we humans do that too, but our most obvious method is asking relevant situational questions. If our boss says, “Hey, I need you to do X,” we might follow up with, “Why is X important?” or “What does success for X look like?” Knowing this makes us more likely to do work that matches or exceeds our boss’s expectations.
Today, many AI chatbots seem nearly indistinguishable from one another. Without active direction, the answers I get are nearly indistinguishable from the answers the next person gets. The opportunity here is vast.
Through better questions and better listening, through a deeper understanding of our quirky, distinct brains, the next generation of products and services can 10x their translation efficacy, whether for learning, productivity, entertainment or support.
v.
Conversational interfaces are magical, yes, but let’s not get stuck at the hype station and forget that the train of quality continues on in its pursuit of ever more obvious interactions enabled by AI’s technological breakthroughs.
The holy grail of effective translation is there for the taking. Yes, it’s a wonderful time to invent.
Other articles in the Looking Glass AI series
Our Souls Need Proof of Work
Today, hustling competes with YOLO. The search for spirituality wrestles with the quest for commercial success. Slow down and meditate! Hurry up and embrace AI! Technology will reduce our toil. Technology will kill our spirit.
At the heart of these debate lies the ever-pulsing question: what is the merit of hard work?
The Death of Product Development as We Know it
Much has been written about what AI’s magical powers can enable—whether engineering, design or documentation. But I believe something just as interesting is happening with how teams build in the era of AI.
The AI Quality Coup
What is quality work nowadays, in the era of AI? Traditional definitions are being upended, and it’s up to us to find the new precipice.
Love your thoughts here, Julie! We are moving towards reciprocity over control when it comes to multi-modal conversational interfaces which ultimately leads to designing relationships with reflexive intelligence or intelligent systems.
lego is my spirit toy too