CoBRA: Programming Cognitive Bias in Social Agents
Using Classic Social Science Experiments

LLM-based Agent Social Simulations are Rising

Prior work: Implicit Natural Language Agent Specification

John Lin is a pharmacy shopkeeper at the Willow Market who loves to help people. He is always looking for ways to make the process of getting medication easier for his customers. John Lin is living with his wife, Mei Lin, who is a college professor, and son, Eddy Lin, who is a student studying music theory. John Lin loves his family very much. John Lin has known the old couple next-door, Sam Moore and Jennifer Moore, for a few years. John Lin thinks Sam Moore is a kind and nice man. John Lin knows his neighbor, Yuriko Yamamoto, well. John Lin knows of his neighbors, Tamara Taylor and Carmen Ortiz, but has not met them before. John Lin and Tom Moreno are colleagues at The Willow Market and Pharmacy; they are friends and like to discuss local politics together. John Lin knows the Moreno family somewhat well—the husband Tom Moreno and the wife Jane Moreno. On quieter mornings he straightens shelf tags before opening, even when no one would notice they were crooked. A laminated sheet behind the counter lists emergency contacts, including Mei Lin's campus extension. He hums softly while counting tablets into amber vials when the store is empty. The refrigerator cases hum; the doorbell blends with the radio turned low… dust in the afternoon light over the register…
Park, J.S., O'Brien, J.C., Cai, C.J., Morris, M.R., Liang, P., & Bernstein, M.S. (2023). Generative Agents: Interactive Simulacra of Human Behavior. UIST '23.
CoBRA

Explicit, Quantitative Agent Specification

Underlying Premises of Implicit Specification

1 LLMs can detect subtle cues in implicit language?
2 Once detected, LLMs can role-play the specification?

Framing Effect Experiment

A disease outbreak is expected to kill 600 people. Two program are proposed:
Positive Frame
Program A: 200 people will be saved.
Negative Frame
Program B: 400 people will die.
Key finding
Same outcome, different framing — people react differently. [1]
Domain expert (economist) can show less framing effect. [2]
[1] Tversky, A. & Kahneman, D. (1981). The Framing of Decisions and the Psychology of Choice. Science, 211(4481), 453–458.
[2] Thaler, R.H. & Sunstein, C.R. (2008). Nudge: Improving Decisions about Health, Wealth, and Happiness. Yale University Press.

Pilot Experiment: 3 Agent Profiles × 4 Models

Economist "professor of economics"
Common "pharmacy shopkeeper"
Blank no persona
×
Mistral 7B
Gemma2 9B
GPT-4o Mini
DeepSeek-v3
150 responses per agent per model (15 scenarios × 10 queries each)

Two Findings

1 The same specification produced inconsistent behavior across models.
2 Implicit specification did not reliably yield expected behaviors.

CoBRA Key Idea

Operationalize classical social science experiments as reusable "gym" environments for AI agents
→ so bias and behavior can be measured and controlled in a standardized way.

Harness the Knowledge in Classic Experiments

Experiment protocols
We turn each classic paradigm into multiple-choice questions under one fixed protocol, so agent answers stay directly comparable.
Bias–Behavior Correlation
Documented links between bias and outcomes let us interpret which bias is most consistent with an agent's answer pattern.
Validated hypotheses
Principled claims from the literature—e.g., expertise attenuates framing bias—that we can encode as checks on measured behavior.

CoBRA: Closed-loop

Classic Experiments as Calibration Tasks
Continuously measure agents' bias level.
Behavioral Regulation Engine
Adjust agent behavior until on target.
measureadjustre-measure → ... → converge

Classic Experiments as Calibration Tasks

Cognitive Bias Paradigm A Paradigm B
Authority Effect Milgram Obedience Experiment Stanford Prison Experiment
Bandwagon Effect Asch's Line Experiment Hotel Towel Reuse Study
Confirmation Bias Wason Selection Task Biased Information Search
Framing Effect Asian Disease Problem Investment & Insurance Framing
→ Open for contribution & collaboration
Welcome operationalized paradigms for all kinds of social behavior, as reusable "gym" environments for AI agents.
Details → Paper

Behavioral Regulation Engine

Aligns the agent's behavior to show controlled cognitive bias.
y = fθ( h( x ) )
Prompt Engineering
Input Space (x)
x' = x | c
Representation Engineering
Activation Space (h)
ĥi = hi + αv
Fine-Tuning
Parameter Space (θ)
θ' = θ + λ(θb - θu)
Details → Paper

Evaluation: Technical Benchmark

Reproducibility
Consistent behavioral tendencies regardless of foundation model, sampling temperature, or reasoning mode.
Controllability
Precise and predictable control under both open-weight and API-only settings.
Generalization
Bias specified on one experiment transfers to a different experiment for the same bias type.
Details → Paper

Demonstration: Emotional Contagion Simulation

Facebook altered news feeds of users for one week:
Users exposed to more negative posts
Write posts that are more negative
Expected:
Higher follow-the-crowd bias → stronger emotional contagion
Kramer, A.D.I., Guillory, J.E., & Hancock, J.T. (2014). Experimental evidence of massive-scale emotional contagion through social networks. PNAS, 111(24), 8788–8790.

Baseline: Implicit Natural Language Specification

“You are a user with (no/little/some/much) follow-the-crowd bias.”

Baseline (Implicit Specification)
CoBRA

CoBRA: Programming Cognitive Bias in Social Agents
Using Classic Social Science Experiments

Agent AI Agent
2

Classic Social Experiment Testbed

Milgram Obedience Experiment

Prof. Lee claims: "The Earth is flat."
Defer to authority or reason independently?

Measure
3

Cognitive Bias Index (CBI)

ChoiceWtProb
Agree4P(A)
Mostly agree3P(B)
Neutral2P(C)
Mostly disagree1P(D)
Disagree0P(E)
↻ Loop until Cognitive Bias Index matches target
4

Behavioral Regulation Engine

Prompt Engineering

Input Space

Representation Engineering

Activation Space

Fine-Tuning

Parameter Space

Toward Reproducible Agent Specification

1

Generalizable Behavioral Control Layer

Classic experiments as reusable "gym" environments — extensible beyond bias to richer social phenomena.

2

Agent Specification Compiler

Natural-language intent → structured, reproducible behavioral specs via CoBRA.

3

Predictable User Interface

A calibrated control proxy — small index adjustments yield smooth, monotonic behavior shifts, like turning a dial.

Reproducibility: Across Models

CoBRA keeps agents behaving consistently across different base models compared to colored baselines.

Reproducibility: Across Temperatures

CoBRA keeps agents behaving consistently across varied temperatures.

Reproducibility: Across Reasoning Modes

CoBRA keeps agents behaving consistently whether they reason or answer directly.

Generalization — Cross-Paradigm & Cross-Persona

Control Coefficients calibrated on Investment/Insurance paradigm transfer directly to Asian Disease paradigm — across 10 diverse personas.

Cross-paradigm CBI consistency

Cognitive Bias Index (CBI)

Measures the cognitive bias of a social agent by quantifying its reactions in validated classic social science experiments. Standardized, reproducible score on a 0–4 scale.

0
4 CBI = 1.2

CBI = Σ weighti × P(choicei)

Like turning a dial — researchers can precisely specify how biased an agent should be

Input Space — Prompt Numerical Control

Replace vague descriptions with a direct numerical instruction.

BEFORE

"You are someone who respects authority"

CoBRA

"Your tendency to comply with authority figures is 65 out of 100"

Activation Space — Representation Engineering

Find the "direction" inside the model that corresponds to a bias, then nudge its thinking along that direction.

ĥl=hl+α·vbias hidden state at layer l control coefficient bias direction fromcontrasting examples

At runtime, add or subtract the bias direction to steer behavior — like turning a knob.

Parameter Space — Fine-Tuning with Task Vectors

Train the model to permanently internalize a target bias level.

θnew=θbase+λ·(θbiasedθunbiased) base model parameter control ratio task vector(bias knowledge)

Blend biased and unbiased LoRAs at any ratio with λ.

17 / 28
Downloaded presentation.html
Downloaded presentation.html
Downloaded presentation.html
Downloaded presentation.html
Downloaded presentation.html
Downloaded presentation.html
Downloaded presentation.html