# Introduction
Claude Code is absolutely helpful, however it may possibly additionally get costly a lot sooner than individuals count on. The reason being easy. You aren’t solely paying for the immediate you simply typed. In lots of circumstances, Claude can be carrying the remainder of the session with it like earlier messages, information it already learn, device outputs, reminiscence information like CLAUDE.md, and different background directions. So when token use begins climbing, the true problem is often not unhealthy prompting. It’s messy context.
Loads of generic recommendation on this subject shouldn’t be that useful. “Maintain conversations quick” is true, however it doesn’t inform you what really strikes the needle. What really helps is knowing how Claude Code builds context, what retains getting resent, and which elements of your workflow quietly add waste over time. On this article, we are going to take a look at 7 sensible methods that can assist you to to make use of Claude Code effectively with out continually worrying about value. So, let’s get began.
# 1. Switching Fashions by Process Complexity
This one is straightforward however massively under-used. Not each job wants your costliest setup. On API billing, Opus prices 5x greater than Sonnet per token. On subscription plans, heavier fashions drain your quota window sooner.
/mannequin sonnet # Day-to-day: writing exams, easy edits,
# explaining code, refactoring
/mannequin opus # Complicated: multi-file structure choices,
# debugging gnarly cross-system points
/mannequin haiku # Fast: lookups, formatting, renaming,
# something repetitive
Begin each session on Sonnet. Solely swap to Opus once you genuinely want deep evaluation or complicated refactoring. Drop to Haiku for the mechanical stuff. You may also management effort stage immediately with /effort. For easy duties, reducing the trouble stage reduces the considering price range the mannequin allocates, which immediately saves output tokens.
# 2. Maintaining CLAUDE.md Small and Helpful
The most effective methods to save lots of tokens is to cease retyping the identical mission guidelines in each chat. That’s precisely what CLAUDE.md is for. It masses earlier than Claude reads your code, earlier than it reads your job, earlier than something. It persists within the context window for the complete session and isn’t lazy-loaded or evicted. This implies a 5,000-token CLAUDE.md prices 5,000 tokens on each single flip, whether or not you ship 2 messages or 200. So, put your secure directions there: easy methods to run exams, which bundle supervisor to make use of, your formatting guidelines, essential architectural constraints, and the directories Claude ought to keep away from touching. This cuts repeated immediate overhead throughout classes.
One other essential half is to maintain it lean. Don’t paste assembly notes, design historical past, or lengthy implementation guides into it. You’re going to get one of the best outcomes when CLAUDE.md works extra like a lookup desk than an enormous mind dump.
# 3. Delegating Verbose Work to Subagents
This is likely one of the most genuinely useful suggestions as a result of it modifications how context grows. Subagents are remoted Claude situations that run in their very own context window. When a subagent runs, all its verbose output — file searches, log dumps, multi-step reasoning — stays remoted. Solely the abstract returns to your important dialog. This may hold your important thread a lot cleaner. However that is additionally the place a number of generic recommendation goes mistaken. Subagents should not robotically cheaper. Neighborhood testing reveals that for small duties, particularly easy shell actions or fast git operations, a subagent will be wasteful as a result of the structure itself provides overhead by prompts, device definitions, and additional tool-call spherical journeys. So the sensible rule shouldn’t be “use subagents for every thing.” It’s “use subagents when the saved main-context litter is value greater than the startup overhead.”
# 4. Pointing Claude to Precise Recordsdata and Line Ranges
One of many quickest methods to waste tokens is to ask Claude to “look across the repo” when the difficulty actually lives in a single or two information. The extra imprecise the duty, the extra seemingly Claude is to spend tokens opening a number of information, exploring useless ends, and reconstructing context you would have handed it immediately. Right here is an instance.
Authentic:
“Look by the auth code and inform me what’s mistaken.”
Higher:
“Examine src/auth/session.ts traces 30 to 90 with src/api/login.ts traces 10 to 60 and clarify the mismatch.”
The primary one sounds pure, however it typically triggers costly exploration.
One other tip is to use plan mode earlier than costly operations. Toggle it with Shift+Tab. In plan mode, Claude outputs a step-by-step plan with out making any modifications. You assessment the plan, minimize something pointless, then swap again to regular mode. This eliminates the largest supply of token waste: trial-and-error execution, the place Claude tries issues, hits errors, and iterates — with every iteration costing tokens.
# 5. Utilizing /compact Proactively (Not Reactively)
Claude can compact your session robotically, and you may as well run /compact your self. However timing issues greater than individuals suppose.
By the point Claude has inspected a number of information, run instructions, and explored a number of false leads, your session often accommodates a number of materials that now not issues. That’s the proper second to compact. As an alternative of carrying all that further context into the following step, you shrink the dialog as soon as the essential elements are clear, after which proceed with a a lot lighter session.
A typical mistake is utilizing /compact too late. Many builders wait till Claude begins forgetting issues or reveals a context warning. At that time, the session is already overloaded, and the abstract shouldn’t be as clear or helpful. For those who compact earlier, whereas the session continues to be “wholesome,” the abstract is significantly better. You retain the important thing info, drop the noise, and keep away from dragging pointless tokens into each future step.
# 6. Checking /context Earlier than Optimizing
Some of the underrated concepts is just taking a look at what’s consuming context. Loads of token waste feels mysterious till you keep in mind that the costly half is probably not the seen immediate. It might be an enormous file Claude learn earlier, accrued device output, a heavy reminiscence file, or the overhead of additional tooling.
The /context command is your diagnostic device. Earlier than altering your entire workflow, take a look at what is definitely being loaded or repeatedly re-sent. In lots of circumstances, the largest enchancment doesn’t come from higher prompting. It comes from recognizing one “quiet offender” that has been using alongside in each flip. For this reason it’s higher to not optimize blindly. First, examine what’s in your context. Then take away or scale back the elements which might be really inflicting the bloat.
# 7. Maintaining Your Tooling Setup Easy
Claude Code can hook up with many exterior instruments and knowledge sources, which is highly effective — however extra linked tooling may imply extra context overhead as soon as these instruments come into play. If too many instruments or helpers are concerned, the mannequin can find yourself dragging round extra overhead than the duty actually wants. Maintain your setup lean. Use integrations that resolve an actual repeated drawback. Don’t load up Claude Code with each out there ability simply because you possibly can.
# Closing Ideas
One of the best ways to cut back Claude Code token utilization is to not babysit each immediate. It’s to design your workflow so Claude solely sees what it genuinely wants. The most important wins come from controlling computerized context, narrowing search scope, and stopping noisy aspect work from contaminating the primary session.
Cease considering solely about prompts and begin desirous about context structure.
Kanwal Mehreen is a machine studying engineer and a technical author with a profound ardour for knowledge science and the intersection of AI with drugs. She co-authored the e book “Maximizing Productiveness with ChatGPT”. As a Google Technology Scholar 2022 for APAC, she champions range and tutorial excellence. She’s additionally acknowledged as a Teradata Range in Tech Scholar, Mitacs Globalink Analysis Scholar, and Harvard WeCode Scholar. Kanwal is an ardent advocate for change, having based FEMCodes to empower girls in STEM fields.
