D · 05 · TECHNICAL WRITE-UP

AgentC Arena.

An AI debate arena.

Two researcher agents, two debater agents, a moderator, a judge. Configurable personalities. Four battle formats from formal debate to roast. It's less about the topic than about what it takes to get a multi-agent system to argue coherently.

REPO github.com/rutgertuit/AgentC →
RUNTIME Google Cloud Run
STATUS Open · personal project

01 · BATTLE MODES

Four formats, one multi-agent shape.

Each mode is the same graph of agents with a different prompt + scoring rubric. The arena treats "debate" and "roast" as variants of the same structural problem: two sides, evidence, rounds, a moderator, a judge. Change the rubric, change the format — keep the engine.

Mode	Format	Note
Debate	Traditional structured debate	Pro + Con make evidence-grounded arguments, the moderator scores each round.
Rap Battle	Lyrical battle with rhyme + wordplay	Same multi-agent shape, different prompt + scoring rubric.
Roast	Comedy roast	The agents lean into tone; the moderator scores on punchline density rather than rigour.
Pitch Off	Startup pitch competition	Pro pitches the idea, Con pitches against it; the judge picks the term sheet.

02 · THE AGENTS

Six roles, one coordinator.

A debate coordinator runs the workflow. Sub-agents take the roles below; each is an ADK agent with its own instruction set and a tightly scoped toolbox so the model doesn't wander.

Role	Who	What it does
Researchers (Pro + Con)	Two grounding agents	Generate evidence the debaters can use. Accept file uploads (PDF / DOCX / TXT) and custom source-data injection so the debate can be grounded in your own corpus, not just public web.
Debaters (Pro + Con)	Two arguing agents	Take the research and argue the side. Personality knobs apply here — same evidence, different voice.
Moderator	Per-round scorer + commentator	Reads the round, scores it on the active rubric (changes by battle mode), surfaces commentary for the live transcript.
Judge	End-of-debate adjudicator	Produces the final summary and calls the winner. Supports early-winner thresholds so a one-sided round can end the debate without burning more turns.

03 · PERSONALITY KNOBS

Three sliders. Same evidence, different voice.

Personality lives at the debater layer, not the researcher layer. The Pro and Con debaters take the same researched evidence and argue it differently depending on three knobs. This is the cleanest reason the rap-battle mode works at all: the underlying evidence is sound; tone makes it sound like a rap.

Knob	Range	Endpoints
Expertise	1–10	Novice ↔ PhD-level argumentation
Tone	1–10	Polite / clinical ↔ Aggressive / passionate
Verbosity	1–10	Concise ↔ Elaborate

04 · NOTABLE DESIGN CHOICES

The non-obvious decisions.

Workflow is strictly ordered: Initializer → Research Phase → Debate Loop → Final Judge. The debater agents will not argue without research in scope; this is enforced at the coordinator, not just hoped for at the prompt.
Minimum-content validation at the boundary: 100 characters minimum for research, 50 for arguments. Whitespace trimmed and validated post-generation so a model that returns an empty string does not silently drop a round.
Session state in one place: All debate state lives in `DebateSession` objects managed by a `SessionManager` singleton. Round number is fetched via `get_debate_state()` — both debaters check the same source of truth before arguing.
WebSocket as the live surface: Every event — research drops, argument lands, moderator scores, judge calls — streams over WebSocket so the arena page renders as the debate happens, not after.
30-minute Cloud Run timeout: Long debates blow past the default 5-minute timeout; the deploy is configured for up to 30. The deeper-than-default budget is the load-bearing change that makes multi-round formats viable.
Custom source-data injection: Beyond file uploads, the researchers accept structured custom-source payloads so a corporate brief or a private corpus can ground the round without exposing the source to the public web.

05 · STACK

What runs underneath.

Google ADK — agent framework. The debate coordinator + sub-agents are all ADK constructs; tool resolution and per-agent instruction scoping are the load-bearing primitives.
Gemini 2.5 Flash, 2.5 Pro, 3 Pro Preview — Flash for grounding and per-round responses, Pro and 3 Pro Preview (with thinking modes) for the heavier reasoning steps in adjudication.
FastAPI + WebSocket — Python backend. The arena renders live over WebSocket; round transitions, research drops, and moderator commentary all stream as events.
Vanilla JS frontend — no framework on the page side. Setup form posts the config, the arena page subscribes to the WebSocket and renders events as they arrive.
Google Cloud Run — serverless container, 30-minute timeout, automatic deploys via GitHub push.

WHY THIS EXISTS

Argument is a tool I want models to be better at, not a novelty. The arena exists because the multi-agent shape (researcher / debater / moderator / judge) generalises: once you have it, you can swap the rubric and explore completely different formats — strategic briefing, boardroom debrief, post-mortem — on the same engine. That's the same instinct as Luminary: separate the engine from the prompt that drives it, so the engine can be re-pointed at the next question without re-writing the orchestration.

Read the source on GitHub →