Luminary.
A voice-first deep-research agent.
Talk to one of four voice agents about anything — markets, companies, science, history. Luminary picks a depth, runs a real source-grounded research pipeline, and reads the result back. Optionally turns it into a two-host podcast.
Three depths, picked automatically from voice.
Trigger it by saying something to one of the voice agents — Maya, Barnaby, Consultant, or Rutger. Luminary detects depth from the phrasing and routes accordingly. Each run is a real, source-grounded investigation — not a single LLM call. Cross-study claim validation catches contradictions across sources before they reach you. QA anticipation pre-answers the follow-up questions you're likely to ask.
| Depth | Voice trigger | Pipeline | Budget |
|---|---|---|---|
| Quick | "quick look at X" · "brief on X" | Single researcher, no follow-ups. | ~3 min |
| Standard | (default) | Sub-question fan-out → parallel research → follow-ups → synthesis. | ~10 min |
| Deep | "deep dive on X" · "comprehensive analysis of X" | Multi-study iterative pipeline: query analysis → study planning → iterative research → claim validation → QA anticipation → strategic analysis → master synthesis. | up to 60 min |
One orchestrator, one deep pipeline, 24 ADK-built agents.
An ElevenLabs voice agent posts to a webhook (HMAC-verified). The research orchestrator detects depth, gates plan-confirm if deep, injects memory and the knowledge graph, and hands off to the deep pipeline. Studies fan out in parallel. A synthesis evaluator loops with a gap analyzer until findings stop changing. Claim validation catches cross-study contradictions, QA anticipation pre-answers the obvious follow-ups, and a master synthesis lands the answer. Results are persisted to GCS, attached to the agent's knowledge base for the next voice turn, and optionally produced as a two-host podcast.
ElevenLabs voice agent ──▶ webhook /webhook/elevenlabs (HMAC-verified)
│
▼
┌──────────────────────────┐
│ research_orchestrator │
│ ─ depth detection │
│ ─ plan/confirm gate │
│ ─ memory + KG injection │
└──────────┬───────────────┘
▼
┌───────────────────────────────────────────────┐
│ deep_pipeline │
│ │
│ query_analyzer ─▶ study_planner ─▶ iterative│
│ ▼ │
│ parallel(researcher × N studies) │
│ ▼ │
│ synthesis_evaluator ─▶ gap_analyzer (loop) │
│ ▼ │
│ claim_validator ─▶ qa_anticipator │
│ ▼ │
│ strategic_analyst ─▶ master synthesis │
└───────────────────────┬───────────────────────┘
▼
GCS results · memory · knowledge graph
▼
agent KB attach → next voice turn
podcast_generator (optional, 2-host)
Observable dashboard (`/explore`)Each phase, the model that fits.
Multi-provider routing is the point. No single model is the right tool for every phase, so each phase gets its own default and a fallback chain. Everything is overridable at runtime via env vars (GEMINI_MODEL, GEMINI_PRO_MODEL, OPENAI_REASONING_MODEL).
| Pipeline phase | Default model |
|---|---|
| Query analysis | Gemini 2.5 Flash |
| Study planning | Gemini 2.5 Flash |
| Study research | Gemini 2.5 Flash (with google_search grounding) |
| Complex study research | Gemini Deep Research (autonomous agent) |
| Study synthesis | OpenAI o4-mini → Gemini Pro → Flash (fallback chain) |
| Master synthesis | OpenAI o4-mini → Gemini Pro → Flash |
| Claim validation | OpenAI o4-mini (contradiction detection) |
| Strategic analysis | Gemini 2.5 Pro |
| Verification | Gemini 2.5 Flash (with web_search tool) |
The non-obvious decisions that shape what it can do.
- Plan/confirm gate before deep runs
- Deep mode has a 60-minute budget. Luminary builds the study plan first, reads it back over voice for confirmation, and only then executes. `AUTO_PROCEED_*` env vars tune this per depth.
- Cancellation is async-safe
- A user-initiated cancel raises a module-level `ResearchCancelled` that every task handler re-raises. No orphaned threads, no zombie LLM calls.
- Checkpoints to GCS
- Long deep runs persist intermediate state to Google Cloud Storage. A crashed run resumes instead of restarting.
- Memory + knowledge graph
- Past research findings get re-injected into related queries. The knowledge graph tracks entities across studies so cross-study claims can be validated and contradictions surfaced.
- Per-agent KB cap
- Each ElevenLabs voice agent has at most MAX_AGENT_KB_DOCS research docs attached at a time (default 3). Oldest is evicted on new attach so the agent's working set stays sharp.
- Two-host podcast generation
- Any synthesis can be re-rendered as a 2-host podcast (Maya + Barnaby) via ElevenLabs TTS. This is the same lineage the rutgertuit.nl podcasts ship on.
What runs underneath.
- Google ADK — the agent framework. 24 purpose-built agents (query_analyzer, study_planner, iterative_researcher, claim_validator, qa_anticipator, strategic_analyst, synthesis_evaluator, podcast_generator, watch_checker, memory_extractor, …).
- Gemini — 2.5 Flash for analysis + grounded study research, 2.5 Pro for strategic analysis, Deep Research as an autonomous agent for complex studies.
- OpenAI o4-mini — synthesis + claim validation. Better contradiction detection at this price point than the Gemini alternatives during evaluation.
- Grok — optional secondary provider for specific phases.
- ElevenLabs Conversational + TTS — four inbound voice agents and the outbound podcast generator.
- Flask + Gunicorn — Python backend. Blueprints for health, webhook, ui_api, explore.
- Observable Framework — the
/exploredashboard. Visualises research jobs, costs, pipeline traces, knowledge graph. Built into the Docker image. - Google Cloud Run + Secret Manager + GCS — serverless runtime, secrets, results persistence. Multi-stage Dockerfile (Node 20 builds the dashboard, Python 3.11 runs the app).
For me, voice turned out to be a better way into deep research than typing. Typing a query into a chatbox is a bottleneck — most of the research I actually want to do happens when I'm walking, driving, between meetings. Luminary is the version that lets me hand off a question in motion and get a source-grounded answer back as audio. The fact that it can also export to a two-host podcast is the seam where this project and the rutgertuit.nl podcasts meet: same ElevenLabs lineage, same prompted-then-chosen production rule.