Subscription burning out after 3 conversations? Learn how to cut Claude Code token usage by 70% with model switching, context management, and smarter prompts.
Three days into your Claude Pro subscription, and you've already hit the limit.
You paid $20, opened Claude Code, ran three conversations, and got the dreaded message: "You've reached your usage limit." You're not alone. Anthropic's own GitHub has hundreds of reports from developers saying their subscription burns 4x faster than expected.
But if you're a non-technical builder who just started using Claude Code to ship real projects, this hits differently. You're not debugging a compiler. You're trying to build something useful, and the tool ran out before you finished your first feature.
The problem isn't your plan. The problem is you're sending the most expensive model to do an intern's job. This guide shows you 7 concrete tricks to cut your token usage by 70%, without downgrading your subscription or losing capability. If you're new to Claude Code, start with the free tutorials on this site first.
Key Takeaways
- Model selection is the #1 lever: Opus burns tokens 5x faster than Sonnet. Most tasks don't need Opus.
- Context accumulates silently: Every message resends your entire conversation history. A 30-minute session can consume your daily quota.
- Configuration files eat credits: Your CLAUDE.md, identity files, and soul files are loaded on every single interaction. 500 lines of instructions = thousands of tokens burned before you even ask a question.
- Batch beats back-and-forth: One detailed prompt uses 75% fewer tokens than five small follow-ups.
- Skills save tokens long-term: Custom slash commands replace repetitive explanations with a single trigger.
| Old Way | New Way | Token Savings |
|---|---|---|
| Always use Opus | Switch models per task | ~60-80% |
| Let context grow forever | Run /compact every 15 min | ~60% |
| Five small follow-up prompts | One batched prompt | ~75% |
| 200-line CLAUDE.md | Lean 60-line CLAUDE.md | ~70% per interaction |
The pattern is clear: most token waste comes from 3-4 default behaviors that are easy to fix.
Is Claude Code "Expensive"? Defining the Real Cost
Claude Code isn't expensive. Unmanaged Claude Code is expensive.
The tool runs on three AI models, each with different pricing per million tokens:
| Model | Input Cost | Output Cost | Best For |
|---|---|---|---|
| Haiku 4.5 | $1/M tokens | $5/M tokens | Simple edits, quick questions |
| Sonnet 4.6 | $3/M tokens | $15/M tokens | Writing, building, refactoring |
| Opus 4.6 | $5/M tokens | $25/M tokens | Complex analysis, architecture |
The average developer spends $6 per day on Claude Code. But 90% of users stay under $12 per day. The difference? The top 10% are running Opus on everything, loading massive context files, and sending one-line follow-ups instead of batched prompts.
Here's the pipeline that matters: 1. You type a prompt → 2. Claude loads your CLAUDE.md + conversation history → 3. The selected model processes everything → 4. You get a response → 5. The entire history grows for the next message.
This is a pipeline, not a magic wand. Every step has a cost multiplier.
Opus vs. Sonnet vs. Haiku: Which Model Do You Actually Need?
The confusion is real. Claude Code defaults to its most capable (and most expensive) model. Most users never change it.
Opus 4.6 is the deep thinker. It's best for complex architectural decisions, multi-file refactoring, and problems that require reasoning across your entire codebase. It costs 5x more than Sonnet per token.
Sonnet 4.6 is the workhorse. It handles 80% of coding tasks — writing functions, editing files, generating components, explaining code — at a fraction of Opus's cost.
Haiku 4.5 is the speedster. It matches Sonnet's performance on straightforward tasks (renaming variables, adding lines, fixing syntax) at one-third the cost.
| Task | Recommended Model | Why |
|---|---|---|
| Rename a variable | Haiku | Simple find-and-replace logic |
| Write a new component | Sonnet | Needs understanding, not genius |
| Refactor 15 files | Opus | Cross-file reasoning required |
| Add a console.log | Haiku | Trivial insertion |
| Debug a complex race condition | Opus | Deep analysis needed |
| Generate boilerplate | Sonnet | Pattern matching, not invention |
The Verdict: Start every session on Sonnet. Switch to Opus only when Sonnet gets stuck. Use Haiku for anything you could describe in one sentence.
Top 7 Token-Saving Tricks That Actually Work
The first mistake people make is trying to "use less" of Claude Code. That's the wrong frame. The goal is to use it smarter — same output, fewer tokens.
1. Switch Models Mid-Session
You don't need to close and reopen Claude Code. Type this:
/model haiku
Done. Everything from now on goes to Haiku. Need Opus back?
/model opus
There's also a keyboard shortcut: Option + P opens a model picker without typing anything. If you're coming from the beginner's guide to Claude Code, this is the single most impactful habit to build.
2. Compress Your Context with /compact
Every message you send carries the entire conversation history. The context window is 200,000 tokens. One debugging session can burn 50,000 of them.
/compact
This compresses your conversation into a summary. It keeps what matters, drops the rest. The savings? Up to 60% fewer tokens on every subsequent message.
/clear
This resets everything. Clean slate. A fresh session with a good prompt almost always outperforms a long session with 50 corrections.
Auto-compaction triggers at 95% of the context window. Don't wait. Run /compact every 15 minutes during long sessions.
3. Create a .claudeignore File
Claude Code reads and indexes files you never asked it to touch. node_modules, build artifacts, lock files. Each one burns tokens on reading.
Create a .claudeignore file in your project root:
node_modules/
dist/
build/
*.lock
.next/
coverage/
This works exactly like .gitignore. On medium-sized codebases, 80-90% of the token budget goes to exploration rather than actual coding. A well-configured .claudeignore can cut that waste dramatically.
4. Keep Your CLAUDE.md Lean
Your CLAUDE.md file is loaded on every single turn. Every word in it costs tokens — not once, but on every interaction.
If your CLAUDE.md is 200 lines, that's hundreds of tokens burned before Claude Code even starts thinking about your request. Keep it under 60 lines. Move detailed documentation, history, and logs elsewhere. Check out the guide to the .claude folder for the right structure.
5. Watch Out for Large Identity and Soul Files
This is the one most people miss.
If you use identity files, soul files, or large context documents that load automatically, all of that context is reloaded on every single message. Every interaction. Every tool call.
Do the math: 500 lines of instructions = roughly 8,000 tokens per interaction. At 50 messages per day, that's 400,000 tokens spent just on loading instructions. That's nearly 10x the daily Pro quota — gone before you write a single line of code.
The rule: If the agent doesn't need information in 80% of conversations, it doesn't belong in the main context file. Move it to memory files or separate documents that load on demand.
6. Batch Your Prompts
One detailed request burns fewer tokens than five small follow-ups.
Instead of:
- "Fix the bug"
- "Oh wait, also add a log"
- "And rename that function"
- "Actually, update the tests too"
Write:
- "Fix the bug in function X, add console.log before the API call, rename the function to handleUserInput, and update the tests accordingly"
One prompt. Four actions. 75% token savings. Each round-trip has overhead because Claude reloads the full context every time.
7. Build Custom Skills Instead of Repeating Yourself
You can create custom tools inside Claude Code:
Skills are markdown files with instructions, triggered by a slash command like /my-skill. Instead of explaining what you want from scratch every time, you write the instructions once and trigger them with three words.
Pipelines chain multiple steps: research → write → edit → publish. Define once, run with one command.
Agents are parallel sessions running on separate worktrees. Each works on its own task without burning the other's context. One writes code, one writes tests, one writes docs. Three sessions, one-third the tokens each. Learn how to build your first one with the custom slash command tutorial.
How to Build Safe Token Habits (Preventing "Bill Shock")
The golden rule: track before you optimize.
Run /cost inside Claude Code to see your current session's token consumption in real-time. The /statusline command adds a persistent status bar at the bottom of your terminal showing context window utilization percentage.
A safe daily protocol:
- Start session → check
/costbaseline - Set model to Sonnet (not Opus)
- Run
/compactevery 15 minutes - Before complex tasks → switch to Opus
- After complex tasks → switch back to Sonnet
- End of session → check
/costtotal
This same pattern is used in the tutorials on this site — every project starts with the right model for the job.
Beyond Token Saving: When to Use Different Models for Different Agents
There's a limit to how much you can save within a single session. The real unlock is architectural: run multiple agents in parallel, each on the cheapest model that can handle its task.
| Agent Role | Model | Why |
|---|---|---|
| Code writer | Sonnet | Needs creativity and understanding |
| Test writer | Haiku | Tests follow predictable patterns |
| Documentation | Haiku | Mostly templated output |
| Architecture review | Opus | Needs deep cross-file reasoning |
| File renaming / cleanup | Haiku | Trivial operations |
Claude Code supports this natively with worktrees. Each agent gets its own context window, its own model, and its own token budget. No cross-contamination. Learn how in the parallel workflows tutorial.
Conclusion: Is Claude Code Worth the Token Cost?
For the pragmatic builder, the answer is a clear yes — if you manage it.
Unmanaged, Claude Code can burn through a Pro subscription in hours. Managed, the same subscription lasts a full workday. The difference is 7 habits: model switching, context compression, file filtering, lean configuration, batched prompts, custom skills, and cost tracking.
The 3-rule version: Start on Sonnet. Batch your asks. Run /compact early and often.
Ready to build something? Pick your first project from the tutorials here and put these token-saving habits to work from day one.
FAQs
Why does Claude Code run out so fast?
Claude Code defaults to Opus, the most expensive model. Each message resends the full conversation history plus your CLAUDE.md and any loaded context files. Without context management (/compact or /clear), a 30-minute session can exhaust your entire daily quota. Switching to Sonnet and running /compact regularly can extend your session by 3-5x.
What's the difference between /compact and /clear?
/compact compresses your conversation into a summary without losing context. /clear wipes everything and starts fresh. Use /compact when you're mid-task and want to keep working. Use /clear when you're switching to a completely different task. Both save tokens, but /compact preserves continuity.
Is Haiku good enough for writing code?
For simple changes — renaming variables, adding lines, fixing syntax errors — absolutely. Haiku matches Sonnet's performance on straightforward tasks at one-third the cost. For anything requiring multi-file reasoning or creative problem-solving, use Sonnet or Opus.
How many tokens does a Claude Pro subscription include?
Claude Pro provides approximately 44,000 tokens per 5-hour rolling window. Max5 gets 88,000, and Max20 gets 220,000. With the optimization tricks in this guide, 44,000 tokens is enough for a full workday of productive coding.
Do large CLAUDE.md and identity files burn tokens?
Yes, and this surprises most people. Every file that loads automatically (CLAUDE.md, memory files, soul files) is resent with every single message. A 500-line instruction file can cost 5,000-10,000 tokens per interaction. Keep your main config files lean and move detailed content to separate files that load on demand.
Subscribe to our newsletter for weekly tips on building with Claude Code — no fluff, just what works.