CoBRA: Programming Cognitive Bias in Social Agents
Using Classic Social Science Experiments

Prior work: Implicit Natural Language Agent Specification

John Lin is a pharmacy shopkeeper at the Willow Market who loves to help people. He is always looking for ways to make the process of getting medication easier for his customers. John Lin is living with his wife, Mei Lin, who is a college professor, and son, Eddy Lin, who is a student studying music theory. John Lin loves his family very much. John Lin has known the old couple next-door, Sam Moore and Jennifer Moore, for a few years. John Lin thinks Sam Moore is a kind and nice man. John Lin knows his neighbor, Yuriko Yamamoto, well. John Lin knows of his neighbors, Tamara Taylor and Carmen Ortiz, but has not met them before. John Lin and Tom Moreno are colleagues at The Willow Market and Pharmacy; they are friends and like to discuss local politics together. John Lin knows the Moreno family somewhat well—the husband Tom Moreno and the wife Jane Moreno. On quieter mornings he straightens shelf tags before opening, even when no one would notice they were crooked. A laminated sheet behind the counter lists emergency contacts, including Mei Lin's campus extension. He hums softly while counting tablets into amber vials when the store is empty. The refrigerator cases hum; the doorbell blends with the radio turned low… dust in the afternoon light over the register…

Park, J.S., O'Brien, J.C., Cai, C.J., Morris, M.R., Liang, P., & Bernstein, M.S. (2023). Generative Agents: Interactive Simulacra of Human Behavior. UIST '23.

CoBRA

Explicit, Quantitative Agent Specification

Underlying Premises of Implicit Specification

1 LLMs can detect subtle cues in implicit language?

2 Once detected, LLMs can role-play the specification?

Framing Effect Experiment

A disease outbreak is expected to kill 600 people. Two program are proposed:

Positive Frame

Program A: 200 people will be saved.

Negative Frame

Program B: 400 people will die.

Key finding

Same outcome, different framing — people react differently. ^[1]

Domain expert (economist) can show less framing effect. ^[2]

[1] Tversky, A. & Kahneman, D. (1981). The Framing of Decisions and the Psychology of Choice. Science, 211(4481), 453–458.
[2] Thaler, R.H. & Sunstein, C.R. (2008). Nudge: Improving Decisions about Health, Wealth, and Happiness. Yale University Press.

Pilot Experiment: 3 Agent Profiles × 4 Models

Economist "professor of economics"

Common "pharmacy shopkeeper"

Blank no persona

×

Mistral 7B

Gemma2 9B

GPT-4o Mini

DeepSeek-v3

150 responses per agent per model (15 scenarios × 10 queries each)

Two Findings

1 The same specification produced inconsistent behavior across models.

2 Implicit specification did not reliably yield expected behaviors.

CoBRA Key Idea

Operationalize classical social science experiments as reusable "gym" environments for AI agents

→ so bias and behavior can be measured and controlled in a standardized way.

Harness the Knowledge in Classic Experiments

Experiment protocols

We turn each classic paradigm into multiple-choice questions under one fixed protocol, so agent answers stay directly comparable.

Bias–Behavior Correlation

Documented links between bias and outcomes let us interpret which bias is most consistent with an agent's answer pattern.

Validated hypotheses

Principled claims from the literature—e.g., expertise attenuates framing bias—that we can encode as checks on measured behavior.

CoBRA: Closed-loop

Classic Experiments as Calibration Tasks

Continuously measure agents' bias level.

Behavioral Regulation Engine

Adjust agent behavior until on target.

measure → adjust → re-measure → ... → converge

Classic Experiments as Calibration Tasks

Cognitive Bias	Paradigm A	Paradigm B
Authority Effect	Milgram Obedience Experiment	Stanford Prison Experiment
Bandwagon Effect	Asch's Line Experiment	Hotel Towel Reuse Study
Confirmation Bias	Wason Selection Task	Biased Information Search
Framing Effect	Asian Disease Problem	Investment & Insurance Framing

→ Open for contribution & collaboration

Welcome operationalized paradigms for all kinds of social behavior, as reusable "gym" environments for AI agents.

Details → Paper

Behavioral Regulation Engine

Aligns the agent's behavior to show controlled cognitive bias.

y = f_θ( h( x ) )

Prompt Engineering

Input Space (x)

x' = x | c

Representation Engineering

Activation Space (h)

ĥ_i = h_i + αv

Fine-Tuning

Parameter Space (θ)

θ' = θ + λ(θ_b - θ_u)

Details → Paper

Evaluation: Technical Benchmark

Reproducibility

Consistent behavioral tendencies regardless of foundation model, sampling temperature, or reasoning mode.

Controllability

Precise and predictable control under both open-weight and API-only settings.

Generalization

Bias specified on one experiment transfers to a different experiment for the same bias type.

Details → Paper

Demonstration: Emotional Contagion Simulation

Facebook altered news feeds of users for one week:

Users exposed to more negative posts

→

Write posts that are more negative

Expected:
Higher follow-the-crowd bias → stronger emotional contagion

Kramer, A.D.I., Guillory, J.E., & Hancock, J.T. (2014). Experimental evidence of massive-scale emotional contagion through social networks. PNAS, 111(24), 8788–8790.

Baseline: Implicit Natural Language Specification

“You are a user with (no/little/some/much) follow-the-crowd bias.”

Baseline (Implicit Specification)

CoBRA

CoBRA: Programming Cognitive Bias in Social Agents
Using Classic Social Science Experiments

xul049@ucsd.edu

AI Agent

2

Classic Social Experiment Testbed

Milgram Obedience Experiment

Prof. Lee claims: "The Earth is flat."
Defer to authority or reason independently?

Measure

3

Cognitive Bias Index (CBI)

Choice	Wt	Prob
Agree	4	P(A)
Mostly agree	3	P(B)
Neutral	2	P(C)
Mostly disagree	1	P(D)
Disagree	0	P(E)

↻ Loop until Cognitive Bias Index matches target

4

Behavioral Regulation Engine

Prompt Engineering

Input Space

Representation Engineering

Activation Space

Fine-Tuning

Parameter Space

Toward Reproducible Agent Specification

1

Generalizable Behavioral Control Layer

Classic experiments as reusable "gym" environments — extensible beyond bias to richer social phenomena.

2

Agent Specification Compiler

Natural-language intent → structured, reproducible behavioral specs via CoBRA.

3

Predictable User Interface

A calibrated control proxy — small index adjustments yield smooth, monotonic behavior shifts, like turning a dial.

Reproducibility: Across Models

CoBRA keeps agents behaving consistently across different base models compared to colored baselines.

Reproducibility: Across Temperatures

CoBRA keeps agents behaving consistently across varied temperatures.

Reproducibility: Across Reasoning Modes

CoBRA keeps agents behaving consistently whether they reason or answer directly.

Generalization — Cross-Paradigm & Cross-Persona

Control Coefficients calibrated on Investment/Insurance paradigm transfer directly to Asian Disease paradigm — across 10 diverse personas.

Cognitive Bias Index (CBI)

Measures the cognitive bias of a social agent by quantifying its reactions in validated classic social science experiments. Standardized, reproducible score on a 0–4 scale.

0

4 CBI = 1.2

CBI = Σ weight_i × P(choice_i)

Like turning a dial — researchers can precisely specify how biased an agent should be

Input Space — Prompt Numerical Control

Replace vague descriptions with a direct numerical instruction.

BEFORE

"You are someone who respects authority"

CoBRA

"Your tendency to comply with authority figures is 65 out of 100"

Activation Space — Representation Engineering

Find the "direction" inside the model that corresponds to a bias, then nudge its thinking along that direction.

At runtime, add or subtract the bias direction to steer behavior — like turning a knob.

Parameter Space — Fine-Tuning with Task Vectors

Train the model to permanently internalize a target bias level.

Blend biased and unbiased LoRAs at any ratio with λ.

CoBRA: Programming Cognitive Bias in Social AgentsUsing Classic Social Science Experiments

LLM-based Agent Social Simulations are Rising

Prior work: Implicit Natural Language Agent Specification

Explicit, Quantitative Agent Specification

Underlying Premises of Implicit Specification

Framing Effect Experiment

Pilot Experiment: 3 Agent Profiles × 4 Models

Two Findings

CoBRA Key Idea

Harness the Knowledge in Classic Experiments

CoBRA: Closed-loop

Classic Experiments as Calibration Tasks

Behavioral Regulation Engine

Evaluation: Technical Benchmark

Demonstration: Emotional Contagion Simulation

Baseline: Implicit Natural Language Specification

CoBRA: Programming Cognitive Bias in Social AgentsUsing Classic Social Science Experiments

Prompt Engineering

Representation Engineering

Fine-Tuning

Toward Reproducible Agent Specification

Generalizable Behavioral Control Layer

Agent Specification Compiler

Predictable User Interface

Reproducibility: Across Models

Reproducibility: Across Temperatures

Reproducibility: Across Reasoning Modes

Generalization — Cross-Paradigm & Cross-Persona

Cognitive Bias Index (CBI)

Input Space — Prompt Numerical Control

Activation Space — Representation Engineering

Parameter Space — Fine-Tuning with Task Vectors

CoBRA: Programming Cognitive Bias in Social Agents
Using Classic Social Science Experiments

CoBRA: Programming Cognitive Bias in Social Agents
Using Classic Social Science Experiments