The AI Product Paradox
The rapid advancement of AI, particularly LLMs, presents entrepreneurs and multinationals alike with an increasingly urgent conundrum: do we want AI to be a seamless, effortless assistant that anticipates our needs and smooths our path, making life undeniably easier? Or do we need AI to be a partner that challenges our assumptions, pushes our boundaries, and helps us grow, even if that process is effortful and, at times, uncomfortable? This tension, I believe, lies at the heart of what I call "The AI Product Paradox": the best AI products challenge their human users, but most human users fundamentally desire products that simply make their lives easier.
We see this paradox playing out across different domains, illuminated by recent research. The very idea that an LLM can pass a Turing Test, leading humans to judge it as human even when compared to a real person, speaks to our susceptibility to seamless, human-like interaction. If AI can mimic conversation so convincingly that it fools us, it caters directly to our desire for effortless, naturalistic interfaces. The easier and more familiar the interaction, the more likely we are to accept and perhaps even prefer it. This push for AI that feels less like a tool and more like a companion taps directly into our preference for ease and comfort.
Yet, when we examine how humans collaborate with AI on tasks requiring judgment, a different dynamic emerges. We humans tend to reject AI's recommendations, often defaulting to their own judgment out of a sense of overconfidence. We want the AI to make our job easier—automating tasks and confirming existing beliefs—but we resist the kind of challenging engagement where the AI's perspective forces us to re-evaluate our own. Our preference for ease clashes with the need for critical interaction, leading us to potentially ignore valuable AI insights if they don't neatly fit our pre-existing mental models or require extra cognitive effort to process.
Further compounding this paradox is the finding that our perception of AI's competence can be easily manipulated by superficial cues. Longer LLM explanations are perceived as more accurate, even when they contain no additional correct information. Verbosity isn’t knowledge, but its an easy heuristic in a confusing world…a heuristic that AI can effortlessly exploit. An AI product designed purely for user satisfaction might prioritize generating lengthy, plausible-sounding text over concise, rigorously accurate information because it feels more authoritative and makes the user feel more confident, thus making the interaction feel "easier" in a subjective sense. This caters to our preference for perceived ease and confidence, even if it sacrifices objective accuracy and genuine understanding.
If we design AI purely to satisfy the immediate human desire for ease, we risk creating systems that reinforce biases, discourage critical thinking, and hinder genuine learning and growth. Products that seamlessly automate tasks de-skill and deprofessionalize. Products that provide effortless answers might reduce our capacity for independent problem-solving. Products that always agree trap us in echo chambers of artificial certainty. The most valuable AI, for long-term human development and societal progress, constructively challenges us, making our work harder but in ways that make us better—offering alternative perspectives, highlighting nuances we've missed, or forcing us to articulate our reasoning more clearly.
Consider an AI tutor that always gives the right answer immediately versus one that asks guiding questions, prompting the student to discover the solution themselves. The first is easier in the short term, but the second is likely far better for fostering meta-learning and deeper understanding. Consider an AI design tool that simply completes tasks versus one that suggests unconventional approaches, requiring the human designer to stretch their creativity. The first is easier, but the second might lead to truly novel outcomes.
The AI Product Paradox, then, reveals a fundamental tension: deep AI products, those designed to challenge us and foster genuine growth, are better for users in the long run, cultivating critical thinking, resilience, and deeper understanding. Yet, the immediate human preference often leans towards shallow products that prioritize ease, comfort, and perceived confidence, even at the expense of accuracy or critical engagement. The critical challenge for anyone building transformative AI is how to design and market deep, challenging products in a market flooded with shallow competitors that cater to immediate user desires. How do you convince users to invest the necessary cognitive effort for long-term gain when the market is optimized for effortless consumption? Navigating this paradox requires not just technical ingenuity in building powerful AI, but also a deep understanding of human psychology and a strategic approach to design and messaging that can guide users towards the path of genuine augmentation, even when the easier, shallower option is just a click away.
Follow me on LinkedIn or join my growing Bluesky!
Research Roundup
Humans Are Failing the Turing Test
How’s this for a headline-grabbing result: at long last LLMs (specifically GPT-4.5) have passed a standard three-party Turing Test!
…or did it?
In the randomized, controlled, and pre-registered study, participants had 5 minutes of simultaneous conversation with an LLM and a real person to figure out which was the real human. “When prompted to adopt a human-like persona, GPT-4.5 was judged to be the human 73% of the time”, while LLaMa-3.1reached “56%”. That’s it, Turing Test passed, right?
But humans were able to distinguish the machines from the humans 73% of the time, we just thought the machines were the humans. If GPT was passing for human the test should come out 50/50. GPT didn’t pass the Turing Test; we failed it.
Even more evidence that humans are failing the turning test comes from the baseline control model, ELIZA. The oldest of old school chat bots is cut-and-paste thought salad from decades ago, and yet participants though it was the human “23%” of the time. If you’ve ever interacted with ELIZA, that score should make you fear for the profoundly eroded social and language skills of 23% of university students.
If nearly a quarter of participants mistook the very early ELIZA for a human, it suggests our criteria for "human-like" might be surprisingly thin, easily triggered by conversational fluency rather than genuine understanding or consciousness. And the fact that GPT-4.5 was judged human more often than the real human highlights a potential bias in how we perceive and categorize conversational partners—over-indexing on certain stylistic features the AI excels at, while overlooking authentic human messiness.
What is a human decision?
So many AI products fail to understand how best to leverage humans and machines: when to rely on one vs the other. I propose the following split: the known vs the unknown.
To explore this idea, a new study found that humans tend to “under-respond to AI predictions” and “reduce effort when presented with confident AI” recommendations. This isn’t a trust issue about AIs, instead “human overconfidence” in our own ability to assess information drive or rejection of AI recommendations. The AI's contribution, even when objectively valuable, is filtered through our self-assured, potentially flawed, internal assessments.
But we can rectify this bias by splitting problems into the known vs unknown. Using an AI’s own confidence model, the study shows that AIs should “automate” decision making when it is confident but “delegate uncertain cases to humans”.
Well-posed is an AI job. Ill-posed is a Human job.
Longer Is Righter
Longer essays tended to get higher grades at university regardless of substance. This was part of a study I published years ago. It turns out that same “longer is righter” bias exists in our judgment of LLMs as well as students.
When people read LLM responses to their prompts “longer explanations increased user confidence, even when the extra length did not improve answer accuracy”. The "Longer Is Righter" heuristic is one of many that LLMs can exploit, and explains why LLM-aided essays are judged as “better” and “more creative” in many studies even as actual creativity decreases.
LLMs don't just mimic human communication; they can leverage superficial cues like explanation length to manipulate our perception of their confidence and knowledge, even if the content is questionable.
<<Support my work: book a keynote or briefing!>>
Want to support my work but don't need a keynote from a mad scientist? Become a paid subscriber to this newsletter and recommend to friends!
SciFi, Fantasy, & Me
Here is the set of Locus SciFi Nominees for 2025:
- I’m Afraid You’ve Got Dragons, Peter S. Beagle
- The Tainted Cup, Robert Jackson Bennett
- The Dead Cat Tail Assassins, P. Djèlí Clark
- The Bright Sword, Lev Grossman
- Asunder, Kerstin Hall
- A Sorceress Comes to Call, T. Kingfisher
- Somewhere Beyond the Sea, TJ Klune
- The Siege of Burning Grass, Premee Mohamed
- Long Live Evil, Sarah Rees Brennan
- The City in Glass, Nghi Vo
As with the scifi list, I’ve read many of these but not all. As with that list, clearly some books took some people’s heart but not mine. I need to read more of these but recommend The Tainted Cup for now.
Stage & Screen
- May 13, London: join a breakfast conversation with InterLaw to explore all the scary ways the world had changed of late.
- May 14, London: it time for my semi-annual lecture at UCL.
- May 15, London: join me in the morning for the FT Chairs’ Forum to talk about "The Economics of AI"
- May 15, London: private dinner with DukeCE and London executives to answer specific questions about AI in business (hint: don't just have it write your emails). (Local London execs, let me know if you'd like to join!)
- May 15, Virtual: Plenary for the 24th Annual Health Literacy Conference: "AI, Innovation, and Equity: Transforming Health Literacy for the Future"
- May 23, Berkeley: I'll be talking "Responsible AI" at the Dutch Consulate.
- June 9, Philadelphia: "How to Robot-Proof Your Kids" with Big Brothers, Big Sisters!
- Late June, South Africa: Finally I can return. Are you in SA? Book me!
- September 18, Oakland: Reactive Conference
- October, UK: More med school education
If your company, university, or conference just happen to be in one of the above locations and want the "best keynote I've ever heard" (shockingly spoken by multiple audiences last year)?
Vivienne L'Ecuyer Ming
Follow more of my work at | |
---|---|
Socos Labs | The Human Trust |
Dionysus Health | Optoceutics |
RFK Human Rights | GenderCool |
Crisis Venture Studios | Inclusion Impact Index |
Neurotech Collider Hub at UC Berkeley | UCL Business School of Global Health |