Skip to content
D · 01 · TECHNICAL WRITE-UP

Luminary.

A voice-first deep-research agent.

Talk to one of four voice agents about anything — markets, companies, science, history. Luminary picks a depth, runs a real source-grounded research pipeline, and reads the result back. Optionally turns it into a two-host podcast.

01 · WHAT IT DOES

Three depths, picked automatically from voice.

Trigger it by saying something to one of the voice agents — Maya, Barnaby, Consultant, or Rutger. Luminary detects depth from the phrasing and routes accordingly. Each run is a real, source-grounded investigation — not a single LLM call. Cross-study claim validation catches contradictions across sources before they reach you. QA anticipation pre-answers the follow-up questions you're likely to ask.

DepthVoice triggerPipelineBudget
Quick"quick look at X" · "brief on X"Single researcher, no follow-ups.~3 min
Standard(default)Sub-question fan-out → parallel research → follow-ups → synthesis.~10 min
Deep"deep dive on X" · "comprehensive analysis of X"Multi-study iterative pipeline: query analysis → study planning → iterative research → claim validation → QA anticipation → strategic analysis → master synthesis.up to 60 min
02 · ARCHITECTURE

One orchestrator, one deep pipeline, 24 ADK-built agents.

An ElevenLabs voice agent posts to a webhook (HMAC-verified). The research orchestrator detects depth, gates plan-confirm if deep, injects memory and the knowledge graph, and hands off to the deep pipeline. Studies fan out in parallel. A synthesis evaluator loops with a gap analyzer until findings stop changing. Claim validation catches cross-study contradictions, QA anticipation pre-answers the obvious follow-ups, and a master synthesis lands the answer. Results are persisted to GCS, attached to the agent's knowledge base for the next voice turn, and optionally produced as a two-host podcast.

ElevenLabs voice agent  ──▶  webhook /webhook/elevenlabs (HMAC-verified)
                                        │
                                        ▼
                          ┌──────────────────────────┐
                          │  research_orchestrator   │
                          │  ─ depth detection       │
                          │  ─ plan/confirm gate     │
                          │  ─ memory + KG injection │
                          └──────────┬───────────────┘
                                     ▼
              ┌───────────────────────────────────────────────┐
              │                deep_pipeline                  │
              │                                               │
              │   query_analyzer ─▶ study_planner ─▶ iterative│
              │           ▼                                   │
              │   parallel(researcher × N studies)            │
              │           ▼                                   │
              │   synthesis_evaluator ─▶ gap_analyzer (loop)  │
              │           ▼                                   │
              │   claim_validator ─▶ qa_anticipator           │
              │           ▼                                   │
              │   strategic_analyst ─▶ master synthesis       │
              └───────────────────────┬───────────────────────┘
                                      ▼
                       GCS results · memory · knowledge graph
                                      ▼
                       agent KB attach  →  next voice turn
                       podcast_generator (optional, 2-host)
                       Observable dashboard (`/explore`)
03 · MODEL ROUTING

Each phase, the model that fits.

Multi-provider routing is the point. No single model is the right tool for every phase, so each phase gets its own default and a fallback chain. Everything is overridable at runtime via env vars (GEMINI_MODEL, GEMINI_PRO_MODEL, OPENAI_REASONING_MODEL).

Pipeline phaseDefault model
Query analysisGemini 2.5 Flash
Study planningGemini 2.5 Flash
Study researchGemini 2.5 Flash (with google_search grounding)
Complex study researchGemini Deep Research (autonomous agent)
Study synthesisOpenAI o4-mini → Gemini Pro → Flash (fallback chain)
Master synthesisOpenAI o4-mini → Gemini Pro → Flash
Claim validationOpenAI o4-mini (contradiction detection)
Strategic analysisGemini 2.5 Pro
VerificationGemini 2.5 Flash (with web_search tool)
04 · NOTABLE DESIGN CHOICES

The non-obvious decisions that shape what it can do.

Plan/confirm gate before deep runs
Deep mode has a 60-minute budget. Luminary builds the study plan first, reads it back over voice for confirmation, and only then executes. `AUTO_PROCEED_*` env vars tune this per depth.
Cancellation is async-safe
A user-initiated cancel raises a module-level `ResearchCancelled` that every task handler re-raises. No orphaned threads, no zombie LLM calls.
Checkpoints to GCS
Long deep runs persist intermediate state to Google Cloud Storage. A crashed run resumes instead of restarting.
Memory + knowledge graph
Past research findings get re-injected into related queries. The knowledge graph tracks entities across studies so cross-study claims can be validated and contradictions surfaced.
Per-agent KB cap
Each ElevenLabs voice agent has at most MAX_AGENT_KB_DOCS research docs attached at a time (default 3). Oldest is evicted on new attach so the agent's working set stays sharp.
Two-host podcast generation
Any synthesis can be re-rendered as a 2-host podcast (Maya + Barnaby) via ElevenLabs TTS. This is the same lineage the rutgertuit.nl podcasts ship on.
05 · STACK

What runs underneath.

  • Google ADK — the agent framework. 24 purpose-built agents (query_analyzer, study_planner, iterative_researcher, claim_validator, qa_anticipator, strategic_analyst, synthesis_evaluator, podcast_generator, watch_checker, memory_extractor, …).
  • Gemini — 2.5 Flash for analysis + grounded study research, 2.5 Pro for strategic analysis, Deep Research as an autonomous agent for complex studies.
  • OpenAI o4-mini — synthesis + claim validation. Better contradiction detection at this price point than the Gemini alternatives during evaluation.
  • Grok — optional secondary provider for specific phases.
  • ElevenLabs Conversational + TTS — four inbound voice agents and the outbound podcast generator.
  • Flask + Gunicorn — Python backend. Blueprints for health, webhook, ui_api, explore.
  • Observable Framework — the /explore dashboard. Visualises research jobs, costs, pipeline traces, knowledge graph. Built into the Docker image.
  • Google Cloud Run + Secret Manager + GCS — serverless runtime, secrets, results persistence. Multi-stage Dockerfile (Node 20 builds the dashboard, Python 3.11 runs the app).
WHY THIS EXISTS

For me, voice turned out to be a better way into deep research than typing. Typing a query into a chatbox is a bottleneck — most of the research I actually want to do happens when I'm walking, driving, between meetings. Luminary is the version that lets me hand off a question in motion and get a source-grounded answer back as audio. The fact that it can also export to a two-host podcast is the seam where this project and the rutgertuit.nl podcasts meet: same ElevenLabs lineage, same prompted-then-chosen production rule.

Read the source on GitHub →