D · 01 · TECHNICAL WRITE-UP

Luminary.

A voice-first deep-research agent.

Talk to one of four voice agents about anything — markets, companies, science, history. Luminary picks a depth, runs a real source-grounded research pipeline, and reads the result back. Optionally turns it into a two-host podcast.

REPO github.com/rutgertuit/Luminary →
RUNTIME Google Cloud Run · europe-west4
STATUS Open · personal project

01 · WHAT IT DOES

Three depths, picked automatically from voice.

Trigger it by saying something to one of the voice agents — Maya, Barnaby, Consultant, or Rutger. Luminary detects depth from the phrasing and routes accordingly. Each run is a real, source-grounded investigation — not a single LLM call. Cross-study claim validation catches contradictions across sources before they reach you. QA anticipation pre-answers the follow-up questions you're likely to ask.

Depth	Voice trigger	Pipeline	Budget
Quick	`"quick look at X" · "brief on X"`	Single researcher, no follow-ups.	~3 min
Standard	`(default)`	Sub-question fan-out → parallel research → follow-ups → synthesis.	~10 min
Deep	`"deep dive on X" · "comprehensive analysis of X"`	Multi-study iterative pipeline: query analysis → study planning → iterative research → claim validation → QA anticipation → strategic analysis → master synthesis.	up to 60 min

02 · ARCHITECTURE

One orchestrator, one deep pipeline, 24 ADK-built agents.

An ElevenLabs voice agent posts to a webhook (HMAC-verified). The research orchestrator detects depth, gates plan-confirm if deep, injects memory and the knowledge graph, and hands off to the deep pipeline. Studies fan out in parallel. A synthesis evaluator loops with a gap analyzer until findings stop changing. Claim validation catches cross-study contradictions, QA anticipation pre-answers the obvious follow-ups, and a master synthesis lands the answer. Results are persisted to GCS, attached to the agent's knowledge base for the next voice turn, and optionally produced as a two-host podcast.

ElevenLabs voice agent  ──▶  webhook /webhook/elevenlabs (HMAC-verified)
                                        │
                                        ▼
                          ┌──────────────────────────┐
                          │  research_orchestrator   │
                          │  ─ depth detection       │
                          │  ─ plan/confirm gate     │
                          │  ─ memory + KG injection │
                          └──────────┬───────────────┘
                                     ▼
              ┌───────────────────────────────────────────────┐
              │                deep_pipeline                  │
              │                                               │
              │   query_analyzer ─▶ study_planner ─▶ iterative│
              │           ▼                                   │
              │   parallel(researcher × N studies)            │
              │           ▼                                   │
              │   synthesis_evaluator ─▶ gap_analyzer (loop)  │
              │           ▼                                   │
              │   claim_validator ─▶ qa_anticipator           │
              │           ▼                                   │
              │   strategic_analyst ─▶ master synthesis       │
              └───────────────────────┬───────────────────────┘
                                      ▼
                       GCS results · memory · knowledge graph
                                      ▼
                       agent KB attach  →  next voice turn
                       podcast_generator (optional, 2-host)
                       Observable dashboard (`/explore`)

03 · MODEL ROUTING

Each phase, the model that fits.

Multi-provider routing is the point. No single model is the right tool for every phase, so each phase gets its own default and a fallback chain. Everything is overridable at runtime via env vars (GEMINI_MODEL, GEMINI_PRO_MODEL, OPENAI_REASONING_MODEL).

Pipeline phase	Default model
Query analysis	`Gemini 2.5 Flash`
Study planning	`Gemini 2.5 Flash`
Study research	`Gemini 2.5 Flash (with google_search grounding)`
Complex study research	`Gemini Deep Research (autonomous agent)`
Study synthesis	`OpenAI o4-mini → Gemini Pro → Flash (fallback chain)`
Master synthesis	`OpenAI o4-mini → Gemini Pro → Flash`
Claim validation	`OpenAI o4-mini (contradiction detection)`
Strategic analysis	`Gemini 2.5 Pro`
Verification	`Gemini 2.5 Flash (with web_search tool)`

04 · NOTABLE DESIGN CHOICES

The non-obvious decisions that shape what it can do.

Plan/confirm gate before deep runs: Deep mode has a 60-minute budget. Luminary builds the study plan first, reads it back over voice for confirmation, and only then executes. AUTO_PROCEED_* env vars tune this per depth.
Cancellation is async-safe: A user-initiated cancel raises a module-level ResearchCancelled that every task handler re-raises. No orphaned threads, no zombie LLM calls.
Checkpoints to GCS: Long deep runs persist intermediate state to Google Cloud Storage. A crashed run resumes instead of restarting.
Memory + knowledge graph: Past research findings get re-injected into related queries. The knowledge graph tracks entities across studies so cross-study claims can be validated and contradictions surfaced.
Per-agent KB cap: Each ElevenLabs voice agent has at most MAX_AGENT_KB_DOCS research docs attached at a time (default 3). Oldest is evicted on new attach so the agent's working set stays sharp.
Two-host podcast generation: Any synthesis can be re-rendered as a 2-host podcast (Maya + Barnaby) via ElevenLabs TTS. This is the same lineage the rutgertuit.nl podcasts ship on.

05 · STACK

What runs underneath.

Google ADK — the agent framework. 24 purpose-built agents (query_analyzer, study_planner, iterative_researcher, claim_validator, qa_anticipator, strategic_analyst, synthesis_evaluator, podcast_generator, watch_checker, memory_extractor, …).
Gemini — 2.5 Flash for analysis + grounded study research, 2.5 Pro for strategic analysis, Deep Research as an autonomous agent for complex studies.
OpenAI o4-mini — synthesis + claim validation. Better contradiction detection at this price point than the Gemini alternatives during evaluation.
Grok — optional secondary provider for specific phases.
ElevenLabs Conversational + TTS — four inbound voice agents and the outbound podcast generator.
Flask + Gunicorn — Python backend. Blueprints for health, webhook, ui_api, explore.
Observable Framework — the /explore dashboard. Visualises research jobs, costs, pipeline traces, knowledge graph. Built into the Docker image.
Google Cloud Run + Secret Manager + GCS — serverless runtime, secrets, results persistence. Multi-stage Dockerfile (Node 20 builds the dashboard, Python 3.11 runs the app).

WHY THIS EXISTS

For me, voice turned out to be a better way into deep research than typing. Typing a query into a chatbox is a bottleneck — most of the research I actually want to do happens when I'm walking, driving, between meetings. Luminary is the version that lets me hand off a question in motion and get a source-grounded answer back as audio. The fact that it can also export to a two-host podcast is the seam where this project and the rutgertuit.nl podcasts meet: same ElevenLabs lineage, same prompted-then-chosen production rule.

Read the source on GitHub →