Build a Multi-Model Agent with Claude Code
Build a Multi-Model Agent with Claude Code
What you'll build
An agent that routes tasks to different AI models automatically — Opus for deep reasoning, Haiku for quick questions, Gemini for web research.
The Problem
You're running Claude Opus on every message. "What's 15% tip on $80?" — Opus. "Remind me to buy milk" — Opus. "Analyze this 40-page contract" — also Opus. One of these needs the most powerful AI on the planet. The other two need a calculator and a notepad. You're burning premium tokens on grocery lists.
What You're Building
An agent with four models from two providers, automatic routing based on conversation context, and a dashboard to track cost savings. Same agent, three brains, 70% less spend.
Milestone 1: Set Up Your Model Roster
Right now your agent has one brain. Let's give it three.
Update my OpenClaw config to set up three models: Claude Opus as the heavy-duty model for complex tasks, Claude Haiku as the fast model for simple questions, and keep Sonnet as the default. Set Haiku as the fallback if the primary model is unavailable. Add clean aliases so I can reference them easily.
What Claude Code does: It restructures your model config with a primary model and fallbacks. OpenClaw supports model fallback chains — if your primary model is down or rate-limited, it automatically tries the next one. Aliases like opus, sonnet, and haiku give you shorthand names you can reference anywhere in the config.
Try it: Run openclaw gateway status — you should see all three models listed.
Milestone 2: Add Gemini for Research
Claude is great at reasoning but can't browse the web natively. Gemini can. Let's add it as your research model.
Add Google Gemini as a custom model provider in my OpenClaw config. Use my GEMINI_API_KEY environment variable. Set it up as a Google Generative AI provider with Gemini 2.0 Flash. I want to be able to route research questions to it.
What Claude Code does: It adds a new provider block to your config using OpenClaw's custom provider system. OpenClaw isn't locked to one AI company — you can mix Anthropic, OpenAI, Google, DeepSeek, and even local models via Ollama. Each provider gets its own API key and model list. Now your agent has four models from two providers.
Try it: Check your config — you should see both anthropic and google providers listed.
Milestone 3: Create Smart Routing
You have the models. Now tell the agent when to use each one.
Set up agent bindings in my OpenClaw config so different conversations use different models. Create three agent entries: "daily" using Sonnet (my default), "researcher" using Gemini, and "analyst" using Opus. Then add bindings that route my main Telegram DM to "daily", and set up two Telegram groups I can create for testing — one for research, one for deep analysis. Walk me through creating those Telegram groups if needed.
What Claude Code does: It creates multiple agents and bindings in your config — rules that route conversations to specific agents based on channel, group, or sender. The binding system uses specificity: a direct peer match beats a group match, which beats a channel-level default. This means your main chat gets Sonnet, but research questions in the research group automatically hit Gemini.
Try it: Check your config — the agents.list should have three entries, and the bindings array should map each to a different context.
Milestone 4: Test the Routing
Everything is configured. Let's verify each model gets the right traffic.
Restart the OpenClaw gateway to pick up the new config, then show me how to verify which model is handling each conversation. I want to test by sending messages in different contexts and checking the logs to confirm the correct model is being used.
What Claude Code does: It restarts the gateway and shows you how to read the routing logs. Every message in the log shows which model processed it — you'll see anthropic/claude-sonnet-4-6 for your main chat, google/gemini-2.0-flash for research, and anthropic/claude-opus-4-6 for deep work. The routing happens silently — you just message, and the right brain answers.
Try it: Send a simple question in your main chat, a web search question in the research group, and a complex analysis question in the deep-work group. Check the logs — each should show a different model.
Milestone 5: Check Your Cost Savings
The real payoff: how much are you saving?
Show me the OpenClaw dashboard at localhost:18789 and walk me through the token usage stats. I want to see how many tokens each model consumed and estimate my cost savings compared to running everything on Opus.
What Claude Code does: It opens the local dashboard where OpenClaw tracks usage. The dashboard shows token consumption per model, per agent, per time period. You can calculate savings by comparing: if all tokens had gone to Opus ($15/M input) vs. the actual split across Haiku ($0.25/M), Sonnet ($3/M), and Gemini ($0.10/M for Flash).
Try it: Open http://localhost:18789 in your browser. Navigate to usage stats. After a few days of use, you'll see the real savings — typically 60-80% for mixed workloads.
What You Built
Remember paying Opus prices for grocery list reminders? Your agent now has four models from two providers, automatic routing based on conversation context, and a dashboard to track it all. You're running the same setup that companies use for production AI — one orchestration layer, multiple specialized models.
Take It Further
- Add a local model — set up Ollama with Llama 3 for fully offline, zero-cost simple tasks
- Per-message routing — configure model switching based on message content, not just channel
- Set up cost alerts — use cron jobs to notify you when daily spend exceeds a threshold
Ready to build your first AI agent?
Live Zoom workshop + 1 month WhatsApp follow-up with Yuval Keshtcher (Hebrew)
Learn about the Workshop