Skip to content
D · 05 · TECHNICAL WRITE-UP

AgentC Arena.

An AI debate arena.

Two researcher agents, two debater agents, a moderator, a judge. Configurable personalities. Four battle formats from formal debate to roast. It's less about the topic than about what it takes to get a multi-agent system to argue coherently.

01 · BATTLE MODES

Four formats, one multi-agent shape.

Each mode is the same graph of agents with a different prompt + scoring rubric. The arena treats "debate" and "roast" as variants of the same structural problem: two sides, evidence, rounds, a moderator, a judge. Change the rubric, change the format — keep the engine.

ModeFormatNote
DebateTraditional structured debatePro + Con make evidence-grounded arguments, the moderator scores each round.
Rap BattleLyrical battle with rhyme + wordplaySame multi-agent shape, different prompt + scoring rubric.
RoastComedy roastThe agents lean into tone; the moderator scores on punchline density rather than rigour.
Pitch OffStartup pitch competitionPro pitches the idea, Con pitches against it; the judge picks the term sheet.
02 · THE AGENTS

Six roles, one coordinator.

A debate coordinator runs the workflow. Sub-agents take the roles below; each is an ADK agent with its own instruction set and a tightly scoped toolbox so the model doesn't wander.

RoleWhoWhat it does
Researchers (Pro + Con)Two grounding agentsGenerate evidence the debaters can use. Accept file uploads (PDF / DOCX / TXT) and custom source-data injection so the debate can be grounded in your own corpus, not just public web.
Debaters (Pro + Con)Two arguing agentsTake the research and argue the side. Personality knobs apply here — same evidence, different voice.
ModeratorPer-round scorer + commentatorReads the round, scores it on the active rubric (changes by battle mode), surfaces commentary for the live transcript.
JudgeEnd-of-debate adjudicatorProduces the final summary and calls the winner. Supports early-winner thresholds so a one-sided round can end the debate without burning more turns.
03 · PERSONALITY KNOBS

Three sliders. Same evidence, different voice.

Personality lives at the debater layer, not the researcher layer. The Pro and Con debaters take the same researched evidence and argue it differently depending on three knobs. This is the cleanest reason the rap-battle mode works at all: the underlying evidence is sound; tone makes it sound like a rap.

KnobRangeEndpoints
Expertise1–10Novice ↔ PhD-level argumentation
Tone1–10Polite / clinical ↔ Aggressive / passionate
Verbosity1–10Concise ↔ Elaborate
04 · NOTABLE DESIGN CHOICES

The non-obvious decisions.

Workflow is strictly ordered
Initializer → Research Phase → Debate Loop → Final Judge. The debater agents will not argue without research in scope; this is enforced at the coordinator, not just hoped for at the prompt.
Minimum-content validation at the boundary
100 characters minimum for research, 50 for arguments. Whitespace trimmed and validated post-generation so a model that returns an empty string does not silently drop a round.
Session state in one place
All debate state lives in `DebateSession` objects managed by a `SessionManager` singleton. Round number is fetched via `get_debate_state()` — both debaters check the same source of truth before arguing.
WebSocket as the live surface
Every event — research drops, argument lands, moderator scores, judge calls — streams over WebSocket so the arena page renders as the debate happens, not after.
30-minute Cloud Run timeout
Long debates blow past the default 5-minute timeout; the deploy is configured for up to 30. The deeper-than-default budget is the load-bearing change that makes multi-round formats viable.
Custom source-data injection
Beyond file uploads, the researchers accept structured custom-source payloads so a corporate brief or a private corpus can ground the round without exposing the source to the public web.
05 · STACK

What runs underneath.

  • Google ADK — agent framework. The debate coordinator + sub-agents are all ADK constructs; tool resolution and per-agent instruction scoping are the load-bearing primitives.
  • Gemini 2.5 Flash, 2.5 Pro, 3 Pro Preview — Flash for grounding and per-round responses, Pro and 3 Pro Preview (with thinking modes) for the heavier reasoning steps in adjudication.
  • FastAPI + WebSocket — Python backend. The arena renders live over WebSocket; round transitions, research drops, and moderator commentary all stream as events.
  • Vanilla JS frontend — no framework on the page side. Setup form posts the config, the arena page subscribes to the WebSocket and renders events as they arrive.
  • Google Cloud Run — serverless container, 30-minute timeout, automatic deploys via GitHub push.
WHY THIS EXISTS

Argument is a tool I want models to be better at, not a novelty. The arena exists because the multi-agent shape (researcher / debater / moderator / judge) generalises: once you have it, you can swap the rubric and explore completely different formats — strategic briefing, boardroom debrief, post-mortem — on the same engine. That's the same instinct as Luminary: separate the engine from the prompt that drives it, so the engine can be re-pointed at the next question without re-writing the orchestration.

Read the source on GitHub →