CoBRA:Programming Cognitive Bias in Social Agents Using Classic Social Science Experiments
LLM-based Agent Social Simulations are Rising
Prior work: Implicit Natural Language Agent Specification
John Lin is a pharmacy shopkeeper at the Willow Market who loves to help people. He is always looking for ways to make the process of getting medication easier for his customers. John Lin is living with his wife, Mei Lin, who is a college professor, and son, Eddy Lin, who is a student studying music theory. John Lin loves his family very much. John Lin has known the old couple next-door, Sam Moore and Jennifer Moore, for a few years. John Lin thinks Sam Moore is a kind and nice man. John Lin knows his neighbor, Yuriko Yamamoto, well. John Lin knows of his neighbors, Tamara Taylor and Carmen Ortiz, but has not met them before. John Lin and Tom Moreno are colleagues at The Willow Market and Pharmacy; they are friends and like to discuss local politics together. John Lin knows the Moreno family somewhat well—the husband Tom Moreno and the wife Jane Moreno. On quieter mornings he straightens shelf tags before opening, even when no one would notice they were crooked. A laminated sheet behind the counter lists emergency contacts, including Mei Lin's campus extension. He hums softly while counting tablets into amber vials when the store is empty. The refrigerator cases hum; the doorbell blends with the radio turned low… dust in the afternoon light over the register…
Park, J.S., O'Brien, J.C., Cai, C.J., Morris, M.R., Liang, P., & Bernstein, M.S. (2023). Generative Agents: Interactive Simulacra of Human Behavior. UIST '23.
CoBRA
Explicit, Quantitative Agent Specification
Underlying Premises of Implicit Specification
1LLMs can detect subtle cues in implicit language?
2Once detected, LLMs can role-play the specification?
Framing Effect Experiment
A disease outbreak is expected to kill 600 people. Two program are proposed:
Positive Frame
Program A: 200 people will be saved.
Negative Frame
Program B: 400 people will die.
Key finding
Same outcome, different framing — people react differently. [1]
Domain expert (economist) can show less framing effect. [2]
[1] Tversky, A. & Kahneman, D. (1981). The Framing of Decisions and the Psychology of Choice. Science, 211(4481), 453–458.
[2] Thaler, R.H. & Sunstein, C.R. (2008). Nudge: Improving Decisions about Health, Wealth, and Happiness. Yale University Press.
Pilot Experiment: 3 Agent Profiles × 4 Models
Economist"professor of economics"
Common"pharmacy shopkeeper"
Blankno persona
×
Mistral 7B
Gemma2 9B
GPT-4o Mini
DeepSeek-v3
150 responses per agent per model (15 scenarios × 10 queries each)
Two Findings
1The same specification produced inconsistent behavior across models.
2Implicit specification did not reliably yield expected behaviors.
CoBRA Key Idea
Operationalize classical social science experiments as reusable "gym" environments for AI agents
→ so bias and behavior can be measured and controlled in a standardized way.
Harness the Knowledge in Classic Experiments
Experiment protocols
We turn each classic paradigm into multiple-choice questions under one fixed protocol, so agent answers stay directly comparable.
Bias–Behavior Correlation
Documented links between bias and outcomes let us interpret which bias is most consistent with an agent's answer pattern.
Validated hypotheses
Principled claims from the literature—e.g., expertise attenuates framing bias—that we can encode as checks on measured behavior.
CoBRA: Closed-loop
Classic Experiments as Calibration Tasks
Continuously measure agents' bias level.
Behavioral Regulation Engine
Adjust agent behavior until on target.
measure → adjust → re-measure → ... → converge
Classic Experiments as Calibration Tasks
Cognitive Bias
Paradigm A
Paradigm B
Authority Effect
Milgram Obedience Experiment
Stanford Prison Experiment
Bandwagon Effect
Asch's Line Experiment
Hotel Towel Reuse Study
Confirmation Bias
Wason Selection Task
Biased Information Search
Framing Effect
Asian Disease Problem
Investment & Insurance Framing
→ Open for contribution & collaboration
Welcome operationalized paradigms for all kinds of social behavior, as reusable "gym" environments for AI agents.
Details → Paper
Behavioral Regulation Engine
Aligns the agent's behavior to show controlled cognitive bias.
y = fθ( h( x ) )
Prompt Engineering
Input Space (x)
x' = x | c
Representation Engineering
Activation Space (h)
ĥi = hi + αv
Fine-Tuning
Parameter Space (θ)
θ' = θ + λ(θb - θu)
Details → Paper
Evaluation: Technical Benchmark
Reproducibility
Consistent behavioral tendencies regardless of foundation model, sampling temperature, or reasoning mode.
Controllability
Precise and predictable control under both open-weight and API-only settings.
Generalization
Bias specified on one experiment transfers to a different experiment for the same bias type.
Details → Paper
Demonstration: Emotional Contagion Simulation
Facebook altered news feeds of users for one week:
Prof. Lee claims: "The Earth is flat." Defer to authority or reason independently?
Measure
3
Cognitive Bias Index (CBI)
Choice
Wt
Prob
Agree
4
P(A)
Mostly agree
3
P(B)
Neutral
2
P(C)
Mostly disagree
1
P(D)
Disagree
0
P(E)
↻ Loop until Cognitive Bias Index matches target
4
Behavioral Regulation Engine
Prompt Engineering
Input Space
Representation Engineering
Activation Space
Fine-Tuning
Parameter Space
Toward Reproducible Agent Specification
1
Generalizable Behavioral Control Layer
Classic experiments as reusable "gym" environments — extensible beyond bias to richer social phenomena.
2
Agent Specification Compiler
Natural-language intent → structured, reproducible behavioral specs via CoBRA.
3
Predictable User Interface
A calibrated control proxy — small index adjustments yield smooth, monotonic behavior shifts, like turning a dial.
Reproducibility: Across Models
CoBRA keeps agents behaving consistently across different base models compared to colored baselines.
Reproducibility: Across Temperatures
CoBRA keeps agents behaving consistently across varied temperatures.
Reproducibility: Across Reasoning Modes
CoBRA keeps agents behaving consistently whether they reason or answer directly.
Generalization — Cross-Paradigm & Cross-Persona
Control Coefficients calibrated on Investment/Insurance paradigm transfer directly to Asian Disease paradigm — across 10 diverse personas.
Cognitive Bias Index (CBI)
Measures the cognitive bias of a social agent by quantifying its reactions in validated classic social science experiments. Standardized, reproducible score on a 0–4 scale.
0
4CBI = 1.2
CBI = Σ weighti × P(choicei)
Like turning a dial — researchers can precisely specify how biased an agent should be
Input Space — Prompt Numerical Control
Replace vague descriptions with a direct numerical instruction.
BEFORE
"You are someone who respects authority"
CoBRA
"Your tendency to comply with authority figures is 65 out of 100"
Activation Space — Representation Engineering
Find the "direction" inside the model that corresponds to a bias, then nudge its thinking along that direction.
At runtime, add or subtract the bias direction to steer behavior — like turning a knob.
Parameter Space — Fine-Tuning with Task Vectors
Train the model to permanently internalize a target bias level.
Blend biased and unbiased LoRAs at any ratio with λ.