Blog

Optimizing the cost
of Claude agents.

By Eric Bowen April 14, 2026 2 min read

When we started routing every agent task to Opus, our API bill got scary fast.

The fix was smaller than I expected: stop treating "run Claude" as one thing.

We have a multi-agent pipeline — PM triages issues, Architect designs, Web-Dev writes code, QA reviews, Marketing drafts release notes. They all called the same model. That was the mistake.

Now:

PM, QA, Marketing, Architect → Sonnet
Web-Dev (writing code) → Opus

Result: the bulk of our calls run on a cheaper model, quality on non-coding work hasn't dropped, and Opus is still there for the task where reasoning density actually matters.

The lesson isn't "Sonnet is cheap, use Sonnet." The lesson is: different tasks have different reasoning demands, and you should route accordingly. "Review this diff for obvious regressions" doesn't need the same horsepower as "write this new module from scratch."

If you're running an agent pipeline and paying one flat rate for the whole thing, try splitting it. Start with the least-reasoning-heavy task (triage, routing, summaries) and move it down a tier. See what breaks. Usually nothing does.

Model selection is a config decision, not an architectural one. Treat it that way.

← Back to blog

Optimizing the costof Claude agents.

Optimizing the cost
of Claude agents.