Disclaimer: I hate the term “GenAI”. Generative models are already a real, technical thing, and plenty of non-generative models generate output like writing and images. I refuse to let marketing lay out the conceptual framing for machine learning. So, although there are a number of different model architectures in GPT, Gemini, Claude, and other platforms (transformers, diffusion models, structured , I’m just going to call all of them “AIs” or “LLMs”, since chatbots is how most people interact with them,

I am a true believer in AI’s potential to revolutionize teamwork. I’m also a deep skeptic that we are approaching this immense potential in a way that will either prove out its true value or will even meet the Hippocratic Oath (which every AI developer should hold dear in their heart). Looming large in my cognitive dissonance in the new paper, “The Cybernetic Teammate: A Field Experiment on Generative AI Reshaping Teamwork and Expertise”. It is from a team that has done excellent work exploring the many different ways talented individuals engage with modern AI and how this impacts their productivity and creativity. Now they tackle the even more complex question of how teams respond when LLMs and diffusion models enter the mix? Their answers are quite informative but there are also some profound shortcomings that need illumination.

The study, conducted with professionals at Procter & Gamble during a company-wide "Product Innovation Challenge," randomly assigned employees to work individually or in pairs (one R&D, one Commercial), either with or without access to AI tools. The authors report compelling results: "individuals with AI matched the performance of teams without AI," AI supposedly "breaks down functional silos" leading to more "balanced solutions," and AI even fulfills "part of the social and motivational role traditionally offered by human teammates." On the surface, this suggests AI is not just a tool, but a functional teammate, capable of leveling the playing field between individuals and groups.

My principal frustration begins with a fundamental question: what the fuck did the teams actually produce? It must surely say somewhere in the paper, but all of the discussion swirls around "solutions" evaluated for "Quality," "Novelty," and "Feasibility." But these evaluations were based on submissions likely taking the form of written reports, slide decks, or presentations. Given that the AI intervention involved tools like ChatGPT after only a one-hour training session, how much of the measured "Quality" improvement reflects genuinely better ideas versus simply better written reports? Did the AI primarily help participants, particularly those less comfortable with formal business writing (perhaps the R&D professionals), articulate their ideas more clearly and professionally, thus boosting the perceived quality of the output document rather than the underlying innovation? The relatively modest improvement in standard deviation scores for "Quality" hints that the efficiency gains might be more superficial than transformative.

Furthermore, the experimental setup itself raises concerns about generalizability. The "teams" consisted of two professionals, often strangers, from different functional areas, collaborating for a single day in a hackathon/workshop format. This hardly reflects the deep, nuanced, often messy process of genuine team innovation built on trust, shared history, and diverse skill sets interacting over time. Optimal team sizes for complex problem-solving are rarely just two, especially two strangers operating under time pressure. Concluding that AI makes individuals "match" these artificial dyads overlooks the richer dynamics of established, larger, and more complex human teams. (It’s also unclear if the control groups received any matched training in creative interaction or brainstorming techniques, comparable to the AI training time, which could bias the results.)

The authors’ claim that AI "breaks down functional silos" also warrants closer scrutiny. The data suggests that AI didn't necessarily help R&D professionals think more commercially or vice-versa. Instead, it seems AI primarily boosted the use of technical language across the board, especially among commercial participants. This isn't fostering balanced ideation; it's potentially just improving the articulation of existing ideas, and given everything we know about AI-assisted creativity including from this very paper, this process homogenized the language used rather than diversifying the thinking.

For me, the most intriguing, yet inexplicably downplayed, failed to appear in the abstract at all. While individuals with AI matched the performance of two-person teams without AI, teams using AI were nearly 3 times more likely to produce "Top 10% Solutions" than individuals with AI or teams without. This suggests that AI's greatest potential may lie not in replacing human teammates, but in augmenting the capabilities of functional human teams. The synergy wasn't Individual + AI ≈ Team; it was Team + AI > Team ≈ Individual + AI. (And that’s just math—how could I be wrong.)

This augmentation, however, comes at a cost highlighted elsewhere in the study: AI-mediated solutions were substantially less original. This agrees with every other paper on LLMs and creativity: they optimize for readability and common patterns, converging diverse thinkers on less novel ideas. If AI helps teams write better reports about more conventional solutions, is that truly progress in innovation?

Finally, studying corporate innovation exercises (holy shit do I hate hackathons), relies on a constrained, standardized process unlikely to foster truly disruptive thinking. While valuable in demonstrating AI's ability to enhance specific outputs within a defined framework, be cautious about extrapolating these findings to the broader landscape of human collaboration and innovation.

The deeper lesson resonates with a core theme: AI as augmentation versus substitution. This experiment suggests AI-as-substitute is currently better at polishing outputs and bridging linguistic gaps than replicating the complex interplay of deep expertise, diverse perspectives, and creative friction found in high-functioning human teams. It can make reports clearer and perhaps quicker to produce, but it doesn't necessarily make the underlying thinking better or more original, and it certainly doesn’t replace the synergistic potential of genuine human collaboration.

If an AI can make anyone articulate your job's core concepts as well as you can, perhaps the value lies less in the articulation and more in the unarticulated nuance, the tacit knowledge, the creative leap—the parts of you that aren't just the job description and a list of skills on a resume. This may sound perverse to many, but if you are not your job… it won't be your job much longer.

Our focus should be on understanding how AI can augment those uniquely human capabilities, not just on celebrating when it helps us write better reports about slightly less original ideas. I’m happy to say there is amazing work in AI-as-augmenter and innovation accelerator, including our Matchmaker AI at The Human Trust. But these new approaches don’t just require novel approaches to AI development, they depend on dramatic changes to human development…even among “elite professionals”.

Follow me on LinkedIn or join my growing Bluesky!

Research Roundup

JARVIS Lives!

Iron Man is one of the best representations of a human and AI working together. More specifically, the scene in End Game in which Tony Stark is talking through possible solutions to time travel physics based on Ant Man’s data. Tony throws out ideas and explores alternatives which JARVIS instantiates them and aids the explorations.

But what if JARVIS wanted to build a better Tony? Too often the solutions discovered by deep models are a mystery hidden across millions of parameters. My goal has always been to build better people—better machines are always in service of that goal. How do we make sure actual humans can learn from these cyborg discoveries?

New research from DeepMind points in an exciting direction. The new method “extracts new chess concepts from AlphaZero”, first extracting vectors of its internal solutions and then filtering them based on “teachability” and "novelty". That mutual constraint that the “concepts” can be “taught” to another AI but also not obvious from human games drives discovery.

https://www.pnas.org/doi/10.1073/pnas.2406675122

Rather than resulting in textual lessons, the system final vectors represent puzzle-solution pairs of chess board arrangements and best actions (very RL state-spacy). But human grandmasters that studied these prototype cases improved their chess performance. The AI taught the human masters something new.

My inevitable self-insertion into the story is Juxapedia, a project my wife and I created that launched my first company. Juxapedia aimed to improve student learning by capturing and organizing the experiences of students at the edge of mastery. These experiences were embedded in a sophisticated network (for its time) and ordered according to the progression of concept development. When new students encountered difficulties, Juxapedia would identify and present them with the most relevant past experiences that were likely to help them achieve a conceptual breakthrough.

https://academy.socos.org/seeking-augniscience/

Backprop or Bust?

Here’s a paper I’m just not sure about: “Optimizing generative AI by backpropagating language model feedback”. It could be an exciting breakthrough: the TextGrad algorithm automates optimization of complex, multi-component AI systems, which could dramatically accelerate progress in building more powerful and capable AI. The authors claim the results are highly generalizable across both parts of a system (prompts, parameters, outputs like molecules or plans) using natural language feedback across diverse domains (science, medicine, coding, agents).

My concern (beyond boilerplate reliability issues with LLM-generated material) is the metaphor of backpropagation. How does TextGrad actually translate natural language critique into concrete system adjustments across potentially non-differentiable components like prompts or external tool calls? Is it true optimization in a mathematical sense or a more sophisticated, feedback-driven heuristic search? It would be disappointing to learn that the implementation might boil down to a very sophisticated form of automated prompt engineering.

Tutor Tutor

We need more of this in AI: a large-scale Randomized Controlled Trial (RCT) applying it in a real-world setting. Well, a group at Stanford did just that in a domain near to my heart, education. More specifically, they developed and evaluated a tutor for tutors.

Tutor CoPilot [come on people—points will be deducted for lazy names] “leverages a model of expert thinking to provide expert-like guidance to tutors as they tutor”. The idea is to assist human tutors, making them more effective, rather than trying to replace them. Clearly I agree so far!

After developing the AI, they ran a large scale, real-world RCT. It showed that“students working on mathematics with tutors randomly assigned to have access to Tutor CoPilot are 4 percentage points (p.p.) more likely to master topics (p<0.01).” In keeping with virtually all previous research on LLM-based productivity boost, “lower-rated tutors” showed substantially higher gains in their students, indicating that the value of Tutor CoPilot in substituting for the missing skills of more experienced tutors.

So I ask the meta-tutoring question: do the lower skilled tutors show sustained improvement in their performance without Tutor CoPilot? The analysis did reveal that they were more likely to engage in active learning techniques such as asking guiding questions rather than giving away answers, but not whether they internalized this for future students.

Reservations aside, I’m thrilled to see external validity and rigorous assessment being applied to AI research!

<<Support my work: book a keynote or briefing!>>

Want to support my work but don't need a keynote from a mad scientist? Become a paid subscriber to this newsletter and recommend to friends!

SciFi, Fantasy, & Me

I just watched the greatest film of all time, though admittedly only within a very nuanced category of Keaton-Chaplin-Looney Tunes spoof old-timey semi-animated fever dream: Hundreds of Beavers. Holy crap…this way amazing! My son and I giggled at the brilliant stupidity of it all for what felt like 3 hours, but in a good way—it just kept developing and deranging and developing.

(“Vivienne,” you may ask, “Does this really count as scifi or fantasy?” Frankly, it doesn’t matter, but it kinda undeniably is both.)

Stage & Screen

April 16, New Delhi & Online: "Economic Transformation, Political Vision, and Human Values" – David Danks and I will beam in for this panel.
May 7, Chicago: Innovation, Collective Intelligence, and the Information-Exploration Paradox
May 8, Porto: Talking about entrepreneurship at the SIM conference in Portugal
May 14, London: it time for my semi-annual lecture at UCL.
- And more is in the works for London, including talks, interviews, and ...standup?
June 12, SF: Golden Angels
June 9, Philadelphia: "How to Robot-Proof Your Kids" with Big Brothers, Big Sisters!
June 18, Cannes: Cannes Lyons
Late June, South Africa: Finally I can return. Are you in SA? Book me!
October, UK: More med school education

If your company, university, or conference just happen to be in one of the above locations and want the "best keynote I've ever heard" (shockingly spoken by multiple audiences last year)?

Vivienne L'Ecuyer Ming

Follow more of my work at
Socos Labs	The Human Trust
Dionysus Health	Optoceutics
RFK Human Rights	GenderCool
Crisis Venture Studios	Inclusion Impact Index
Neurotech Collider Hub at UC Berkeley	UCL Business School of Global Health