Multi-provider BYOK in practice: a real technical brief that used 4 different models
Everything we've written about orchestration so far has been theory. This one's a case study.
Last week I used NEXUS to produce a 2,000-word technical brief on the state of quantum-resistant encryption for a client. It's the kind of deliverable that normally takes a skilled writer four to six hours and costs the client a few hundred dollars in freelance fees. With orchestration, it took four minutes of wall-clock time and under a dollar in API fees.
Here's exactly what happened, stage by stage, with the provider and the cost for each slice.
The directive
"Write a 2,000-word technical brief for a non-technical executive audience on the state of quantum-resistant cryptography. Cover: why RSA-2048 is under threat, what lattice-based cryptography is, the timeline for NIST standardization, and practical action items for CTOs. Tone: authoritative but accessible. Include at least 3 sources."
That's the whole input. One paragraph. No prompt engineering, no step-by-step instructions. The orchestrator takes it from there.
Stage 1: Research (Gemini Pro, free tier)
The orchestrator read this as research-heavy and dispatched three research agents in a fan-out, each on a different angle:
- Agent 1: the threat timeline for RSA-2048 under quantum attack (Shor's algorithm, current qubit counts, projection curves)
- Agent 2: the NIST post-quantum standardization history (2016 call, 2022 finalists, 2024 standards)
- Agent 3: a lattice-cryptography primer (LWE, CRYSTALS-Kyber, CRYSTALS-Dilithium)
All three ran on Gemini Pro. Why Gemini? Strong grounding in recent technical literature, competitive pricing, and, for research specifically, its verbosity is an asset, because it leaves more raw material to work with. Each call came back with four to six sources, quoted and dated.
Cost: $0.12 across three parallel calls. Wall-clock: 22 seconds.
Stage 2: Structural outline (Claude Opus)
With the research in hand, the orchestrator sent it to a structural specialist on Claude Opus. Its job: take the three research blobs, design a 2,000-word structure that flows, and return an outline with word budgets per section.
Why Opus here? Its long-context reasoning is the best available for "read these 3,000 tokens of research and build a coherent structure on top of them." GPT-4 does it adequately; Opus does it more consistently in our internal evals. Structure is a high-leverage step: a bad outline poisons every downstream agent's work, so this is where we spend the Opus tokens. The outline came back with six sections, clear transitions, sidebar suggestions, a strong hook, and per-section budgets summing to 1,950 words, tight enough that the final draft wouldn't need heavy trimming.
Cost: $0.28. Wall-clock: 11 seconds.
Stage 3: Section drafting (GPT-4, six specialists)
The orchestrator split the outline into six sections and dispatched six writing specialists, five in parallel (sections with no dependencies), one after the preceding section's conclusion was set (transition-dependent).
All six ran on GPT-4. Why? For prose with a specific tone, "authoritative but accessible", GPT-4's voice consistency across calls is the best we've benchmarked. Each specialist got its slice of the outline, a 250-word style guide in its system prompt, and the adjacent sections' key points for continuity. Each also had a narrow job: one wrote only the hook, another only the threat-timeline section, another only the action-items close. Specialization wins here: each writer could fully focus on nailing its slice without juggling the others.
Cost: $0.41 across six GPT-4 calls. Wall-clock: 48 seconds (parallelism).
Stage 4: Fact-check (Gemini Pro, free tier)
Draft in hand, the orchestrator ran a fact-checker on Gemini to verify every factual claim, flag anything unsupported, and cross-reference against the Stage 1 research. Why Gemini again? For verification, recency matters, and Gemini has fresher knowledge on NIST updates than most alternatives. Plus, for fact-checking, verbose output with citations is a feature.
Two claims got flagged. One was a qubit count that had been updated since an older source, so the orchestrator triggered a correction agent (Claude Opus) to revise that paragraph. The other was a year that turned out to be correct, flagged as "verify," kept after review.
Cost: $0.08. Wall-clock: 18 seconds.
Stage 5: Grammar and style polish (local Llama, free)
Final pass: grammar, awkward phrasing, passive-voice trimming, sentence-rhythm smoothing. This is mechanical work, so a 70B Llama on our own servers handled it at zero marginal cost. Why not GPT-4 for this? Because it's wasteful. Grammar polish doesn't need frontier reasoning, and a local model does it as well or better, with steady latency, no rate limits, and no token cost.
Cost: $0.00. Wall-clock: 14 seconds.
Final: Assembly and delivery (orchestrator)
The orchestrator assembled the six sections in order, inserted the transitions, applied the fact-check corrections, and emitted the final 1,987-word brief with three cited sources.
Total cost: $0.89. Total wall-clock: about 4 minutes. Providers touched: 4 (Gemini, Claude, OpenAI, local Llama).
Why multi-provider matters
On a single-provider stack, the same brief would have run somewhere between $2 and $6, depending on the provider. GPT-4 for everything: about $3.50. Claude Opus for everything: about $4.80. Gemini for everything: about $1.20 (but weaker on the prose sections). Multi-provider BYOK let us pick the best model for each slice, and the savings compound, because the slices where free-tier providers win (Gemini free tier, local Llama) cost nothing.
The flat-rate SaaS alternative would have charged $200 for the month, in which this brief is one of maybe twenty artifacts, an amortized $10 each. On NEXUS, it's $0.89 plus the $20 subscription, spread across however many briefs you run. Ten briefs a month puts you at $2.89 each; thirty puts you at $1.56. Linear scaling on usage, no flat-rate markup. That's the BYOK math in practice.
What this isn't
This isn't a claim that NEXUS-produced work beats a skilled human writer. On a maximum-care deliverable, it doesn't: a senior journalist with a full day, original interviews, and angle choices no model makes well will produce something sharper.
It is a claim that NEXUS beats a rushed human, a junior, or a single-model workflow, at a hundredth of the time and a two-hundredth of the dollar cost. For the 80% of work that's competent-and-fast rather than elite-and-slow, orchestration is a different category of tool. Use it for what it's for. That's the whole pitch.
Next: "Directive to deliverable: watch one sentence turn into a finished investor memo", a full end-to-end walkthrough of a single directive moving through eight specialists in real time.