LLMs as Synthetic Witnesses: Promise and Distortion

VZ editorial frame

Read this piece through one operating lens: AI does not automate first, it amplifies first. If the underlying decision architecture is clear, AI scales clarity. If it is noisy, AI scales noise and cost.

VZ Lens

Through a VZ lens, this analysis is not content volume - it is operating intelligence for leaders. Synthetic witnesses can reveal latent narrative patterns, but they also hallucinate coherence. Decision value depends on calibration and verification design. The practical edge comes from turning this into repeatable decision rhythms.

Large language models are excellent at simulating human speech. They are less so when it comes to human behavior.

TL;DR

A large language model (LLM) is capable of generating text that sounds human. This can create the illusion that it also reflects a human perspective. But an LLM is not a model of human behavior—it is a model of human text. There is a big difference between the two. This article shows exactly what you can and cannot expect from an LLM-based synthetic persona—and what additional layers are needed to push the boundaries.

The Silence of the Auditorium

The floorboards creak hoarsely beneath my feet as my weight is distributed across them. Somewhere above the high, vaulted ceiling, a bird chirps, but the sound breaks through the stained-glass window as if coming from another world. The portraits of professors lining the walls stare ahead, blindly. The air is thick with dust and the scent of old books. I sit, and the silence is so thick that I can almost hear the roar of my own thoughts. This space is filled with words that were spoken long ago, and yet they float here, invisible. Right now, when so many new words are being born that have never been here before. I wonder which ones will remain, and which ones are just echoes—empty, yet perfectly formed.

1. What does an LLM actually do?

Large language models (GPT-4, Claude, Llama, etc.) are text prediction systems. Their training data: a massive collection of texts written by humanity. The LLM learns which text is likely to follow which text.

This is an extremely powerful capability. Human texts contain a vast portion of humanity’s knowledge, experience, and way of thinking.

But there is a fundamental limitation: the LLM learns how humans write—not how they think, decide, or feel.

This distinction is crucial from the perspective of synthetic personas.

2. The Strengths of LLMs in the Context of Synthetic Personas

What it does well:

Simulating a human voice: LLMs are excellent at mimicking how a member of a specific demographic group speaks. The text sounds natural and human.
Recalling Schemas and Stereotypes: If you ask it to simulate a typical situation, the LLM reconstructs the most common reactions from its training data. This works when the situation is not unusual.
Perspective-Switching: The LLM is capable of speaking from different perspectives—if you ask it to respond “from the perspective of a fifty-year-old rural teacher,” it will try to do so.
Situation interpretation and context handling: It can handle longer situation descriptions and provide contextually relevant responses.

3. Limitations of the LLM in a synthetic persona context

What it doesn’t do well:

3.1 Overcoherence

The LLM’s texts are extremely coherent. But real people are full of contradictions, ambivalence, and conflicting impulses. If the simulated persona is always coherent, that stems from the nature of the LLM—not from human nature.

3.2 The “average-person collapse”

The LLM reproduces the statistical average of the training data. Even if you ask it to play the role of a persona with “low neuroticism, high IoU, and an avoidant coping style”—the response tends to revert toward the average human response. The persona profile gets overwritten.

3.3 Lack of Predictive Validity

An LLM does not simulate a human—it generates text that appears human. It has no internal state model. It has no BIS/BAS engine. It has no IoU sensitivity level. If a persona profile is specified only in the prompt, the LLM does not maintain consistency across all decision-making situations.

3.4 Culture-specific bias

The training data for large LLMs is heavily biased toward Anglo-Saxon, Western, urban middle-class contexts. Hungarian cultural context, CEE norms, and local consumer patterns are underrepresented. Simulating a Hungarian persona is less reliable than, for example, an English one.

3.5 Lack of Stress and Dynamics

The LLM is in a snapshot state. It cannot model how a persona would change after 6 months of chronic stress—unless this is encoded as an explicit layer.

4. The difference between the LLM and the persona engine

This difference is crucial:

	LLM-prompt-based persona	Psychological engine-based persona
State model	None (resets with every prompt)	Yes (layered, updated)
Trigger logic	None (inferred from situation description)	Yes (explicit trigger library)
Coping layer	None (simulated based on text average)	Yes (explicit, style + flexibility)
Overcoherence	High	Actively controlled
Prompt fragility	High (depends on question phrasing)	Low (internal model is stable)
Cultural calibration	Weak (Anglo-Saxon bias)	Can be calibrated by design
Validation	Difficult (every prompt is different)	Testable, measurable

5. How can the psychological engine enhance the LLM?

The LLM cannot replace a psychological engine—but it is an excellent output generator when the input is provided by the engine.

The correct architecture:

Psychological engine → State-output → LLM → Natural text

The engine determines: the persona’s current state (stress level, active trigger, coping mode, dominant emotion)
It passes this state to the LLM as a structured description
The LLM does not “play a character”—instead, it generates natural text from a precisely described internal state

This approach combines the engine’s predictive accuracy with the LLM’s text-generation capabilities—without relying on the LLM for state modeling.

6. What can you ask them to do, and what not?

✅ Yes — ask an LLM-based synthetic persona for:

Natural text that sounds human
A first approximation of a general reaction to a given situation
A list of hypotheses that must then be validated
A quick comparison of different perspectives
Scenario exploration for exploratory purposes (not for operational decisions)

❌ Do not ask an LLM-based synthetic persona for:

Decision forecasts requiring predictive validity
Accurate simulation of stress-sensitive, dynamic reactions
Culturally calibrated behavioral forecasts for the Hungarian market
Statistically representative data
Anything that should be marked with a clear red flag (vulnerable group, high-stakes decision)

7. The validity limit: what the “Perils of Synthetic Replacements” study revealed

Sharma et al. (2024) empirically examined in their study “Synthetic Replacements: The Perils” (cit:109) the extent to which real survey respondents can be replaced by LLM simulations.

Key findings:

LLM responses systematically differ from those of real humans, particularly regarding extreme attitudes and sensitive topics
This divergence is not random: LLMs compress the distribution of opinions (pulling them toward the mean)
The deviation varies by demographic group—the bias is significantly greater for certain groups
The prompting method strongly influences the result—there is no stable, reproducible output

This does not mean that LLM-based personas are useless—it means that their limitations must be clearly understood.

8. Summary

LLMs are excellent text generators—and mediocre persona simulators. They are good at producing text that sounds human. They are poor at simulating psychologically valid behavior.

The correct approach: use the LLM as an output generator, but entrust state modeling, trigger logic, and validation to a psychological engine. This way, we can combine the strengths of both without running into the LLM’s limitations.

This article is the twenty-third installment in the Synthetic Personas series. Next installment: Longitudinal persona — how does a synthetic human age?

Zoltán Varga | vargazoltan.ai — Market research, artificial intelligence, synthetic thinking

Strategic Synthesis

Identify which current workflow this insight should upgrade first.
Use explicit criteria for success, not only output volume.
Use a two-week cadence to update priorities from real outcomes.

Next step

If you want your brand to be represented with context quality and citation strength in AI systems, start with a practical baseline and a priority sequence.

Start with AI Scorecard Browse Hungarian originals