Multi-provider BYOK in practice: a real technical brief that used 4 different models
Everything we've written about orchestration has been theoretical. This post is a case study.
Last week I used NEXUS PRIME to produce a 2,000-word technical brief on the state of quantum-resistant encryption for a client. It is the kind of deliverable that normally takes a skilled writer 4-6 hours and costs the client a few hundred dollars in freelance fees. With orchestration, it took 4 minutes of wall-clock time and under one dollar in API fees.
Here is exactly what happened, step by step, with the provider and cost for each slice.
The directive
"Write a 2,000-word technical brief for a non-technical executive audience on the state of quantum-resistant cryptography. Cover: why RSA-2048 is under threat, what lattice-based cryptography is, timeline for NIST standardization, and practical action items for CTOs. Tone: authoritative but accessible. Include at least 3 sources."
That's it. One paragraph. No prompt engineering. No step-by-step instructions. The orchestrator takes it from there.
Stage 1: Research (Gemini Pro, free tier)
The orchestrator identified this as research-heavy and dispatched three research agents in a fan-out pattern, each querying a different angle:
- Agent 1: threat timeline for RSA-2048 under quantum attack (Shor's algorithm, current qubit counts, projection curves)
- Agent 2: NIST post-quantum standardization history (2016 call, 2022 finalists, 2024 standards)
- Agent 3: lattice cryptography primer (LWE, CRYSTALS-Kyber, CRYSTALS-Dilithium)
All three ran against Gemini Pro's research mode. Why Gemini? Three reasons: strong grounding in recent technical literature, competitive pricing, and for research tasks the model's verbosity is actually useful (more raw material to work with). Each research call pulled in roughly 4-6 sources with quotations and dates.
Cost: $0.12 total across three parallel calls. Wall-clock time: 22 seconds.
Stage 2: Structural outline (Claude Opus)
With raw research in hand, the orchestrator dispatched a structural specialist running on Claude Opus. Its job: take the three research blobs, design a 2,000-word article structure that flows logically, and produce an outline with word budgets per section.
Why Claude Opus for this? Opus's long-context reasoning is the best available for "look at these 3,000 tokens of research and design a coherent structure on top of them." GPT-4 does this adequately; Opus does it consistently better in our internal evaluations. Structure is a high-leverage step — a bad outline makes every downstream agent's work worse — so we spend Opus tokens here rather than on downstream drafting.
The outline that came back had 6 sections with clear transitions, sidebar suggestions, and a strong hook. Word budgets per section summed to 1,950 — tight enough that the final draft wouldn't need aggressive trimming.
Cost: $0.28. Wall-clock time: 11 seconds.
Stage 3: Section drafting (GPT-4, 5 specialists in parallel)
The orchestrator split the outline into six sections and dispatched six writing specialists. Five ran in parallel (sections that didn't depend on each other), one ran after the preceding section's conclusion was finalized (transition-dependent).
All six ran on GPT-4. Why? For prose with a specific tone ("authoritative but accessible"), GPT-4's voice-consistency across calls is the best we've benchmarked. Each specialist received: their section of the outline, a 250-word style guide in their system prompt, and the adjacent sections' key points for continuity.
Each specialist also had a narrow job — one wrote only the hook, another only the threat-timeline section, another only the action-items closer. Specialization beats generalization here, because each writer could focus entirely on nailing their slice without juggling the others.
Cost: $0.41 across six GPT-4 calls. Wall-clock time: 48 seconds (parallelism).
Stage 4: Fact-check pass (Gemini Pro, free tier)
Draft in hand, the orchestrator dispatched a fact-checker agent against Gemini. Its job: verify every factual claim in the draft, flag anything unsupported, and cross-reference claims against the research pulled in Stage 1.
Why Gemini again? For verification, recency of training data matters. Gemini has fresher knowledge on NIST standardization updates than most alternatives. And for fact-checking, verbose output with citations is a feature.
Two claims got flagged. One was a specific qubit count that had been updated since an older source. The orchestrator triggered a correction agent (Claude Opus) to revise the affected paragraph. The other flag was a year that turned out to be correct — the fact-checker noted it as "verify" and the orchestrator kept it.
Cost: $0.08. Wall-clock time: 18 seconds.
Stage 5: Grammar and style polish (local Llama, free)
Final pass: grammar, awkward phrasing, passive voice reduction, sentence-rhythm smoothing. This is mechanical work. A 70B-parameter Llama running on our owned servers handles it for zero marginal cost.
Why not GPT-4 for this? Because it's wasteful. Grammar polish doesn't need frontier-model reasoning. A fine-tuned local model does it as well or better, with deterministic latency, no rate limits, and no token cost.
Cost: $0.00. Wall-clock time: 14 seconds.
Final: Assembly and delivery (orchestrator)
The orchestrator assembled the six sections in order, inserted the transitions, applied the fact-check corrections, and emitted the final 1,987-word brief with three cited sources.
Total cost: $0.89
Total wall-clock time: ~4 minutes
Total providers touched: 4 (Gemini, Claude, OpenAI, local Llama)
Why multi-provider matters
On a single-provider stack, the same brief would have cost somewhere between $2 and $6, depending on which provider. GPT-4 for everything: ~$3.50. Claude Opus for everything: ~$4.80. Gemini for everything: ~$1.20 (but lower quality on the prose sections).
Multi-provider BYOK let us pick the best model for each slice, and the savings compound because we use free-tier providers (Gemini free tier, local Llama) for the slices where they win.
The flat-rate SaaS alternative would have charged $200 for the month, in which this single brief would have been one of maybe twenty artifacts produced. Amortized cost per brief on that plan: $10. On NEXUS: $0.89 + $20/month subscription, amortized over however many briefs you run. If you run 10 briefs a month, your per-brief cost is $2.89. If you run 30, it's $1.56.
Linear scaling on usage. No flat-rate markup. That's the BYOK math in practice.
What this isn't
This isn't a claim that NEXUS-produced work beats a skilled human writer. It doesn't, on maximum-care deliverables. A senior journalist spending a full day on this brief would produce something sharper on voice, with original interviews, with angle choices that no LLM currently makes well.
It IS a claim that NEXUS-produced work beats a rushed human, a junior, or a single-model AI workflow — at 1/100th the time cost and 1/200th the dollar cost. For the 80% of work that is competent-and-fast rather than elite-and-slow, orchestration is a different category of tool.
Use it for what it's for. That is the whole pitch.
Next post: "Directive to deliverable: watch one sentence turn into a finished investor memo" — full end-to-end walkthrough of a single directive ("write an investor memo for our seed round") moving through 8 specialists in real time.