Claude Code Token Guide: How to Save 70% (Without Downgrading)

Subscription burning out after 3 conversations? Learn how to cut Claude Code token usage by 70% with model switching, context management, and smarter prompts.

Three days into your Claude Pro subscription, and you've already hit the limit.

You paid $20, opened Claude Code, ran three conversations, and got the dreaded message: "You've reached your usage limit." You're not alone. Anthropic's own GitHub has hundreds of reports from developers saying their subscription burns 4x faster than expected.

But if you're a non-technical builder who just started using Claude Code to ship real projects, this hits differently. You're not debugging a compiler. You're trying to build something useful, and the tool ran out before you finished your first feature.

The problem isn't your plan. The problem is you're sending the most expensive model to do an intern's job. This guide shows you 7 concrete tricks to cut your token usage by 70%, without downgrading your subscription or losing capability. If you're new to Claude Code, start with the free tutorials on this site first.

Key Takeaways

Model selection is the #1 lever: Opus burns tokens 5x faster than Sonnet. Most tasks don't need Opus.
Context accumulates silently: Every message resends your entire conversation history. A 30-minute session can consume your daily quota.
Configuration files eat credits: Your CLAUDE.md, identity files, and soul files are loaded on every single interaction. 500 lines of instructions = thousands of tokens burned before you even ask a question.
Batch beats back-and-forth: One detailed prompt uses 75% fewer tokens than five small follow-ups.
Skills save tokens long-term: Custom slash commands replace repetitive explanations with a single trigger.

Old Way	New Way	Token Savings
Always use Opus	Switch models per task	~60-80%
Let context grow forever	Run `/compact` every 15 min	~60%
Five small follow-up prompts	One batched prompt	~75%
200-line CLAUDE.md	Lean 60-line CLAUDE.md	~70% per interaction

The pattern is clear: most token waste comes from 3-4 default behaviors that are easy to fix.

Is Claude Code "Expensive"? Defining the Real Cost

Claude Code isn't expensive. Unmanaged Claude Code is expensive.

The tool runs on three AI models, each with different pricing per million tokens:

Model	Input Cost	Output Cost	Best For
Haiku 4.5	$1/M tokens	$5/M tokens	Simple edits, quick questions
Sonnet 4.6	$3/M tokens	$15/M tokens	Writing, building, refactoring
Opus 4.6	$5/M tokens	$25/M tokens	Complex analysis, architecture

The average developer spends $6 per day on Claude Code. But 90% of users stay under $12 per day. The difference? The top 10% are running Opus on everything, loading massive context files, and sending one-line follow-ups instead of batched prompts.

Here's the pipeline that matters: 1. You type a prompt → 2. Claude loads your CLAUDE.md + conversation history → 3. The selected model processes everything → 4. You get a response → 5. The entire history grows for the next message.

This is a pipeline, not a magic wand. Every step has a cost multiplier.

Opus vs. Sonnet vs. Haiku: Which Model Do You Actually Need?

The confusion is real. Claude Code defaults to its most capable (and most expensive) model. Most users never change it.

Opus 4.6 is the deep thinker. It's best for complex architectural decisions, multi-file refactoring, and problems that require reasoning across your entire codebase. It costs 5x more than Sonnet per token.

Sonnet 4.6 is the workhorse. It handles 80% of coding tasks — writing functions, editing files, generating components, explaining code — at a fraction of Opus's cost.

Haiku 4.5 is the speedster. It matches Sonnet's performance on straightforward tasks (renaming variables, adding lines, fixing syntax) at one-third the cost.

Task	Recommended Model	Why
Rename a variable	Haiku	Simple find-and-replace logic
Write a new component	Sonnet	Needs understanding, not genius
Refactor 15 files	Opus	Cross-file reasoning required
Add a console.log	Haiku	Trivial insertion
Debug a complex race condition	Opus	Deep analysis needed
Generate boilerplate	Sonnet	Pattern matching, not invention

The Verdict: Start every session on Sonnet. Switch to Opus only when Sonnet gets stuck. Use Haiku for anything you could describe in one sentence.

Top 7 Token-Saving Tricks That Actually Work

The first mistake people make is trying to "use less" of Claude Code. That's the wrong frame. The goal is to use it smarter — same output, fewer tokens.

1. Switch Models Mid-Session

You don't need to close and reopen Claude Code. Type this:

/model haiku

Done. Everything from now on goes to Haiku. Need Opus back?

/model opus

There's also a keyboard shortcut: Option + P opens a model picker without typing anything. If you're coming from the beginner's guide to Claude Code, this is the single most impactful habit to build.

2. Compress Your Context with /compact

Every message you send carries the entire conversation history. The context window is 200,000 tokens. One debugging session can burn 50,000 of them.

/compact

This compresses your conversation into a summary. It keeps what matters, drops the rest. The savings? Up to 60% fewer tokens on every subsequent message.

/clear

This resets everything. Clean slate. A fresh session with a good prompt almost always outperforms a long session with 50 corrections.

Auto-compaction triggers at 95% of the context window. Don't wait. Run /compact every 15 minutes during long sessions.

3. Create a .claudeignore File

Claude Code reads and indexes files you never asked it to touch. node_modules, build artifacts, lock files. Each one burns tokens on reading.

Create a .claudeignore file in your project root:

node_modules/
dist/
build/
*.lock
.next/
coverage/

This works exactly like .gitignore. On medium-sized codebases, 80-90% of the token budget goes to exploration rather than actual coding. A well-configured .claudeignore can cut that waste dramatically.

4. Keep Your CLAUDE.md Lean

Your CLAUDE.md file is loaded on every single turn. Every word in it costs tokens — not once, but on every interaction.

If your CLAUDE.md is 200 lines, that's hundreds of tokens burned before Claude Code even starts thinking about your request. Keep it under 60 lines. Move detailed documentation, history, and logs elsewhere. Check out the guide to the .claude folder for the right structure.

5. Watch Out for Large Identity and Soul Files

This is the one most people miss.

If you use identity files, soul files, or large context documents that load automatically, all of that context is reloaded on every single message. Every interaction. Every tool call.

Do the math: 500 lines of instructions = roughly 8,000 tokens per interaction. At 50 messages per day, that's 400,000 tokens spent just on loading instructions. That's nearly 10x the daily Pro quota — gone before you write a single line of code.

The rule: If the agent doesn't need information in 80% of conversations, it doesn't belong in the main context file. Move it to memory files or separate documents that load on demand.

6. Batch Your Prompts

One detailed request burns fewer tokens than five small follow-ups.

Instead of:

"Fix the bug"
"Oh wait, also add a log"
"And rename that function"
"Actually, update the tests too"

Write:

"Fix the bug in function X, add console.log before the API call, rename the function to handleUserInput, and update the tests accordingly"

One prompt. Four actions. 75% token savings. Each round-trip has overhead because Claude reloads the full context every time.

7. Build Custom Skills Instead of Repeating Yourself

You can create custom tools inside Claude Code:

Skills are markdown files with instructions, triggered by a slash command like /my-skill. Instead of explaining what you want from scratch every time, you write the instructions once and trigger them with three words.

Pipelines chain multiple steps: research → write → edit → publish. Define once, run with one command.

Agents are parallel sessions running on separate worktrees. Each works on its own task without burning the other's context. One writes code, one writes tests, one writes docs. Three sessions, one-third the tokens each. Learn how to build your first one with the custom slash command tutorial.

How to Build Safe Token Habits (Preventing "Bill Shock")

The golden rule: track before you optimize.

Run /cost inside Claude Code to see your current session's token consumption in real-time. The /statusline command adds a persistent status bar at the bottom of your terminal showing context window utilization percentage.

A safe daily protocol:

Start session → check /cost baseline
Set model to Sonnet (not Opus)
Run /compact every 15 minutes
Before complex tasks → switch to Opus
After complex tasks → switch back to Sonnet
End of session → check /cost total

This same pattern is used in the tutorials on this site — every project starts with the right model for the job.

Beyond Token Saving: When to Use Different Models for Different Agents

There's a limit to how much you can save within a single session. The real unlock is architectural: run multiple agents in parallel, each on the cheapest model that can handle its task.

Agent Role	Model	Why
Code writer	Sonnet	Needs creativity and understanding
Test writer	Haiku	Tests follow predictable patterns
Documentation	Haiku	Mostly templated output
Architecture review	Opus	Needs deep cross-file reasoning
File renaming / cleanup	Haiku	Trivial operations

Claude Code supports this natively with worktrees. Each agent gets its own context window, its own model, and its own token budget. No cross-contamination. Learn how in the parallel workflows tutorial.

Conclusion: Is Claude Code Worth the Token Cost?

For the pragmatic builder, the answer is a clear yes — if you manage it.

Unmanaged, Claude Code can burn through a Pro subscription in hours. Managed, the same subscription lasts a full workday. The difference is 7 habits: model switching, context compression, file filtering, lean configuration, batched prompts, custom skills, and cost tracking.

The 3-rule version: Start on Sonnet. Batch your asks. Run /compact early and often.

Ready to build something? Pick your first project from the tutorials here and put these token-saving habits to work from day one.

FAQs

Why does Claude Code run out so fast?

Claude Code defaults to Opus, the most expensive model. Each message resends the full conversation history plus your CLAUDE.md and any loaded context files. Without context management (/compact or /clear), a 30-minute session can exhaust your entire daily quota. Switching to Sonnet and running /compact regularly can extend your session by 3-5x.

What's the difference between /compact and /clear?

/compact compresses your conversation into a summary without losing context. /clear wipes everything and starts fresh. Use /compact when you're mid-task and want to keep working. Use /clear when you're switching to a completely different task. Both save tokens, but /compact preserves continuity.

Is Haiku good enough for writing code?

For simple changes — renaming variables, adding lines, fixing syntax errors — absolutely. Haiku matches Sonnet's performance on straightforward tasks at one-third the cost. For anything requiring multi-file reasoning or creative problem-solving, use Sonnet or Opus.

How many tokens does a Claude Pro subscription include?

Claude Pro provides approximately 44,000 tokens per 5-hour rolling window. Max5 gets 88,000, and Max20 gets 220,000. With the optimization tricks in this guide, 44,000 tokens is enough for a full workday of productive coding.

Do large CLAUDE.md and identity files burn tokens?

Yes, and this surprises most people. Every file that loads automatically (CLAUDE.md, memory files, soul files) is resent with every single message. A 500-line instruction file can cost 5,000-10,000 tokens per interaction. Keep your main config files lean and move detailed content to separate files that load on demand.

Subscribe to our newsletter for weekly tips on building with Claude Code — no fluff, just what works.