Replicants Are Replicating

Replicants Are Replicating

In this week's Research Roundup let's simulate humanity.

💡
Follow me on LinkedIn or join my growing Bluesky! Or even..hey whats this...Instagram?

Means, Medians, & Modes...Oh My!

What happens when you use an LLM to model 216 different humans? As it turns out, you get a surprisingly good model of the collective behavior of a market.

Endowed with hundreds of "distinct, real-world investor personas”, LLM agents end up disagreeing about “S&P500 news headlines” along predictable lines: political divides spiked around elections, income divides around tax cuts, and race-based divides around the George Floyd protests.

This simulated disagreement wasn't just a party trick. It correlated "strongly with abnormal trading volume” and predicted market returns, outperforming existing measures.

While LLMs may not capture the messy, dynamic generator function of a single human mind, they do seem to capture the aggregate distribution of our collective (textual) behavior. By giving it personas, the researchers are sampling from different parts of that distribution—a far better approximation of a complex system than just asking for the "average" opinion.

Perhaps we can just simulate the next 10 years of market returns on a server farm overnight and wake-up in the future.

Quality Is That Which Is Missing

Humans are irrational and angry and tribal. Let’s build a digital world of AIs and let them show us how to get along. Yes, I’m talking about the big dream: let’s simulate Twitter!

Some researchers did just that, creating a “generative social simulation — that embeds [LLMs] within Agent-Based Models to create socially rich synthetic platforms
where agents can post, repost, and follow others.”

The results were...not encouraging. Without any special prompting, the AI agents organically recreated the digital misery Musk knows and loves:

  • “partisan echo chambers”,
  • the concentration of “influence among a tiny elite”, and
  • the “amplification of the polarized voices”.

These aren't bugs; they are emergent features of social networks’ core architecture.

Simulating the failures of social media isn’t the end. If the bots are screaming at each other just like us, the simulation can test interventions to fix the problems. In this case, researchers tested six common proposed interventions. Here’s what they found:

Chronological ordering: “removing engagement-based ranking” reduced inequality (fewer social media celebrities) but it actually made partisanship worse while reducing engagement.

Downplaying dominant voices: promoting “posts with fewer reposts also reduced inequality” but it “had no measurable effect on partisan amplification or homophily.”

Boosting out-partisan content: “had little impact across any outcome dimension.” Disappointing.

Hiding social statistics and hiding biographies:people (and bots) do rely on social signals to pick what to share (lazy, cowardly, and terrible), and so hiding these does increase diversity of sharing. But unfortunately neither intervention reduces “homophily, inequality, and partisan amplification”.

Bridging attributes: this intervention promotes “high-quality, constructive content”. Only bridging attributes broke the “link between partisanship and engagement”. It also “modestly increased cross-partisan connections”. But you can’t have it all: the intervention also increased inequality as “visibility became concentrated among a narrow set of high-scoring posts”.

I choose “bridging attributes”. Inequality of attention isn't great, but it's a different class of problem than a society that can no longer agree on a shared reality. I'm far more concerned about the erosion of pluralism and the rise of toxic polarization than I am about some users getting more retweets than others.

Replicants Are Mean

AI is a perfect student of psychology, and that's a problem.

Using LLMs to simulate research participants, a new study attempted to replicate “156 psychological experiments from top social science journals”. The simulations replicated the main effects of most studies with ease (73-81% success).

From there things get messy because the AIs were too good. They consistently “produced larger effect sizes than humans” and reproduced more complex interaction effects only “46–63% of the time. And when original studies found no significant effect, the LLM simulations produced one a remarkable 68-83% of the time.

And then there's the tell: the AI failed spectacularly on the very topics that are most human—nuanced, socially sensitive topics involving race, gender, and ethics. So what's going on here?

This pattern of results suggests these LLMs aren't actually simulating a person so much as a mean. It has learned the clean, statistically average signal from the mountain of text it was trained on, but it has none of the noisy, messy, lived experience that defines an individual.

It represents the static, aggregate distribution implied by a function of our collective behavior, not the dynamic reality implied by a collection of our individual functions.

This makes LLMs an incredible new tool for rapidly pilot-testing the "average" human response—a massive value-add for human research of all types. But for understanding the complex, sensitive, and beautifully idiosyncratic reality of individual human lives? For that, we still need humans.

💡
<<Support my work: book a keynote or briefing!>> Want to support my work but don't need a keynote from a mad scientist? Become a paid subscriber to this newsletter and recommend to friends!

Vivienne L'Ecuyer Ming

Follow more of my work at
Socos Labs The Human Trust
Dionysus Health Optoceutics
RFK Human Rights UCSD Cognitive Science
Crisis Venture Studios Inclusion Impact Index
Neurotech Collider Hub at UC Berkeley UCL Business School of Global Health