
MCP is having a moment.
You wire up a few MCP servers, point your agent at them, and suddenly it can do everything. Query data. Create tickets. Read docs. Hit internal APIs. It feels like you finally gave the model hands.
But there’s a very unsexy failure mode showing up in real systems, and a fresh Hacker News thread plus a good writeup made it impossible to ignore.
The failure mode is simple:
By the time your agent finishes loading tool schemas and “getting ready to work”, it has already burned a painful amount of its context window. Sometimes a majority of it. Before it has done anything useful.
If you build agents for real work, not demos, you should care. Context is your budget. Spend it on the problem, not on tool brochures.
This post breaks down what’s actually happening (context tax), why MCP toolchains are uniquely vulnerable (schema bloat and tool fanout), what to do about it (dynamic tool loading, “thin” servers, and yes, CLI alternatives), and when MCP is still absolutely the right abstraction.
Along the way I’ll reference the discussion that kicked this off, and an external post that lays out the CLI alternative clearly: MCP server eating context window: CLI alternative.
The context window is not “memory”. It’s working capital.
People still talk about context like it’s a long term memory slot. It’s not. It’s your scratchpad, your instructions, your tool definitions, your conversation, your retrieved docs, your intermediate reasoning. Everything competes for the same space.
So when you add tools, you are not just adding capability. You are adding overhead that has to be present in the prompt for the model to use those tools reliably.
That overhead is the “context tax”.
And with MCP, that tax can get huge because MCP encourages a certain shape of system:
- lots of discrete servers
- lots of functions per server
- big JSON schemas so the model can call them correctly
- descriptive text for safety and guardrails
- and then, in many implementations, all of it gets shoved into the context at once
You can feel the issue immediately when an agent starts a task with a long preamble like “Available tools: …” and the tool list scrolls for pages. That is literal budget evaporation.
What actually gets injected? The hidden prompt payload
An MCP integration usually needs to provide the model with:
- Tool names (and grouping, sometimes)
- Natural language descriptions (often verbose)
- Parameter schemas (JSON schema style, with nested objects, enums, optional fields)
- Return type descriptions
- Examples (sometimes)
- Policies or constraints (sometimes copied for each tool)
- Server metadata (auth, capabilities, versioning, etc, depending on framework)
Now multiply this by:
- 10 servers
- each server exporting 20 to 80 tools
- each tool schema being 1 KB to 8 KB of text once serialized into a prompt friendly format
That adds up fast. It is not weird to end up with tens of thousands of tokens of “tool definitions”.
And the truly annoying part is that this payload is often repeated every turn. Even if your runtime is “smart” about caching, many systems still end up reintroducing a lot of it due to how tool availability is represented to the model each step.
Schema bloat: the silent killer
Schema bloat is not just “too many tools”.
It’s the combination of:
- overly expressive schemas (deep nesting, large enums, many optional branches)
- verbose descriptions for every field
- trying to encode business logic in tool docs
- “one tool per endpoint” design, instead of task level tools
Tool calling works best when tools are few, composable, and easy to select between. But MCP server design often mirrors an API surface area, which is the opposite. APIs are built for computers. Toolsets are built for language models.
When you mirror an API, you get:
- 40 functions for CRUD and edge cases
- overlapping capabilities
- similar names
- and tiny differences in parameters the model has to keep straight
So you pay context to explain all that. Then you pay again in mistakes when the model picks the wrong one anyway.
Tool fanout: MCP makes it easy to over integrate
MCP’s biggest selling point is also what gets teams into trouble.
Once you have the pattern, it’s addictive:
- “Let’s add Jira.”
- “Now Slack.”
- “Now GitHub.”
- “Now the data warehouse.”
- “Now internal admin APIs.”
- “Now the feature flag system.”
Each integration seems small on its own, but the agent sees the sum total. And the agent has to carry enough of that tool vocabulary in context to make good choices.
If your agent is a generalist, this might be fine. But if the agent is designed for a narrow workflow, tool fanout is self inflicted pain.
The practical symptoms you’ll see in production
When MCP context overhead gets out of control, teams usually don’t notice it as “token cost” first. They notice it as weird behavior:
- The model gets dumber mid task. It starts strong, then loses the plot. This is often because earlier important instructions get pushed out of the window by tool definitions and retrieved docs.
- It over calls tools. Not because tools are needed, but because tool descriptions are highly salient and crowd out “think first” instructions.
- It under calls tools. The tool list is too long, so selection quality drops and the model defaults to text answers.
- More retries, more loops. Because one wrong call forces a correction, which forces more context, which makes the next call worse.
- Higher latency. More tokens in, more tokens processed. Also more tool consideration time if your agent does routing.
- Costs that scale with integrations, not with work. You pay for the tool payload even when the task is simple.
None of this is theoretical. If you’ve watched traces from an agent framework, you’ve probably seen it.
“But we have a big context model now.” Still not free.
Yes, context windows are bigger than they used to be. But the logic remains:
- larger windows cost more
- larger windows increase the temptation to stuff more in
- larger windows can hide sloppy prompt architecture until you hit a new ceiling
- and larger windows do not fix attention dilution. Having 200 pages in front of you does not mean you read the right paragraph.
You can absolutely brute force some of this with a bigger model. A lot of teams do. The bill shows up later.
Dynamic tool loading: the most important mitigation
If you do one thing, do this: do not load all tools all the time.
Instead, load tools dynamically based on the task stage. There are a few patterns:
1) Two step routing (cheap planner, then tool loading)
- Step A: run a small “router” prompt that decides which tool groups are needed (Jira vs GitHub vs DB vs none)
- Step B: only inject the schemas for those groups into the main agent
This can cut prompt size drastically. And it improves tool choice because the model is selecting from a smaller menu.
2) Progressive disclosure
Start with zero tools, then:
- attempt a solution
- if blocked, ask “Do I need external action?”
- then load only the tool family that unblocks you
This looks slower in steps, but it often wins overall because each step is cheaper and more reliable.
3) Tool indexing (describe tools outside the context)
Instead of injecting full schemas, inject an index:
- tool name
- one line summary
- cost/risk hints
Then have a meta tool like load_tool_schema(tool_name) that fetches the full schema only when needed.
This works especially well if your agent platform supports “tool retrieval” like document retrieval.
Design “thin” MCP servers, not “API mirrors”
If your MCP server exports 80 endpoints, you probably built an API mirror.
Try building task level tools instead. Some examples:
- Bad:
jira_create_issue,jira_update_issue,jira_add_comment,jira_transition_issue,jira_search_jql… - Better:
jira_handle_ticket(action, summary, context)whereactionis one of 5 things and the server does the internal orchestration
This feels like “hiding power”, but you’re not hiding it. You’re packaging it for the model. The server can still do the complicated part deterministically, with validation and guardrails, without forcing the model to juggle 30 parameters.
This also reduces schema surface area, which reduces context tax.
The CLI alternative: why it can be cheaper and more robust
The blog post I linked earlier argues something that makes a lot of sense in practice: for many developer workflows, a CLI tool can be a better integration layer than MCP.
Here’s the core idea:
Instead of injecting huge JSON schemas, you provide one tool like:
run_command(command: string)
or slightly safer:
run_command(command: string, cwd: string, timeout: number)
Then the model uses the CLI’s own help text, error messages, and composable commands to do real work.
Why this helps:
- One tool schema instead of hundreds
- CLI UX is already optimized for succinctness and composability
- output is often compact and structured enough
- commands can be cached, repeated, diffed
- you can sandbox execution tightly (containers, read only FS, allowlisted binaries)
There are real tradeoffs of course (security, injection risk, nondeterminism if you allow arbitrary commands), but the shape is compelling: the model doesn’t need a dictionary of every operation. It needs one way to execute deterministic programs.
A good middle ground is a constrained CLI runner: allowlisted commands only, with structured arguments, and a fixed working directory.
What about code execution as a tool?
Code execution is another alternative that often beats MCP for certain categories of work:
- data transforms
- parsing
- summarization of long outputs
- quick calculations
- generating structured artifacts (JSON, CSV, diffs)
Again the reason is prompt economics. One “execute Python” tool can replace a hundred “data utilities” tools and avoids huge schema payloads. Plus you get correctness gains because the machine runs the code.
But you still need to manage context. The code and outputs can get large too. So you want patterns like:
- store artifacts out of band (files, object storage)
- return only summaries or hashes to the model
- reference artifacts by ID
Which teams should care the most?
Some teams can ignore this for a while. Others will feel it immediately.
You should take the MCP context window problem seriously if you are:
- Building agents with long multi step workflows, where earlier constraints must remain in memory (compliance, policy, user preferences).
- Operating at scale, where per run overhead multiplies into real spend.
- Using models with smaller context windows for cost reasons, or using smaller models for routing.
- Integrating lots of enterprise systems, where each integration comes with lots of endpoints and permissions.
- Shipping productized agents, where latency and reliability are features, not engineering curiosities.
If you’re prototyping an internal assistant that runs twice a day, you can absolutely overstuff the prompt and move on. Just know what you’re paying for.
Where MCP still shines (and it’s not just hype)
This post is not “MCP bad”. MCP is legitimately useful. It shines when:
1) You need a standard contract across many tools
MCP gives you a uniform way to expose capabilities, auth, and metadata. For organizations with many internal systems, that standardization is gold.
2) You want better safety than raw execution
Compared to “run arbitrary command”, a well designed MCP server can:
- validate inputs
- enforce permissions
- constrain actions
- log every operation
- implement rate limits and auditing
This is especially valuable for customer facing agents.
3) You have non developer tool providers
If other teams can publish MCP servers without coordinating on bespoke integrations, you unlock velocity. That’s the platform story.
4) Your tasks truly require many heterogeneous actions
Some workflows legitimately bounce across systems. If the agent is a generalist operator, having a broad toolbelt can be correct.
The trick is you still need prompt discipline. MCP makes integration easier. It does not make context free.
A practical decision framework: MCP vs CLI vs code exec
Here’s a simple way to choose.
Choose MCP when…
- You need strong guardrails and explicit schemas
- Actions are high risk (billing, deletion, production changes)
- You want a stable API like contract for tools
- You can keep tool surfaces small and task oriented
- You can implement dynamic tool loading
Choose CLI when…
- The domain already has great CLIs (git, kubectl, terraform, ripgrep, jq)
- You want maximum leverage with minimal schema
- You can sandbox it properly
- Your users are developers and can tolerate “terminal shaped” workflows
Choose code execution when…
- The work is mostly computation, transformation, parsing
- Determinism and correctness matter more than natural language reasoning
- You can externalize artifacts and keep outputs compact
Most mature systems end up hybrid. MCP for high risk actions. CLI for developer ergonomics. Code exec for data work. And a retrieval layer for docs.
How to reduce MCP overhead without abandoning it
If you already committed to MCP, here are concrete tactics that help quickly:
- Audit your tool list: count tools, average schema length, and how often each tool is used. You’ll find dead weight.
- Collapse similar tools into fewer task level tools.
- Move verbose documentation out of the prompt and into retrievable docs.
- Add tool grouping and only load groups when needed.
- Add a router model step to select tool families.
- Prefer short enums and avoid giant parameter objects unless necessary.
- Return concise outputs from tools. Don’t dump entire records unless asked. Add pagination and summarization server side.
- Cache tool schemas in your agent runtime so you’re not re serializing huge text blobs every turn.
- Measure context tax explicitly: log prompt token counts by category (instructions vs tools vs retrieved docs vs conversation).
That last one matters more than it sounds. If you don’t measure it, people will keep “just adding one more integration” until the agent quietly degrades.
One more angle: context tax is also an ops and documentation problem
A lot of MCP bloat comes from good intentions.
Someone adds careful field descriptions. Another person adds examples. Another adds policy reminders. Soon every tool reads like a mini README.
But that content belongs in a place where humans can read it and models can retrieve it when relevant. Not shoved into every single step.
This is where good technical writing habits matter. Tool docs should be modular. Queryable. Versioned. Easy to update.
And honestly, publishable.
If you’re already writing down patterns for “how we design tools”, “how we keep schemas small”, “when to use MCP vs CLI”, you should turn those notes into real internal docs or even public guidance.
Junia AI is useful here in a pretty practical way: you can take messy engineering notes and turn them into structured, searchable posts, then keep them updated as your platform evolves. If you’re doing this kind of work regularly, their AI text editor helps clean up and maintain technical docs without rewriting everything from scratch: AI text editor. And if you’re publishing a lot of internal knowledge, automated AI internal linking makes those docs easier to navigate over time: AI internal linking.
Wrap up
MCP is a powerful abstraction. It also has a predictable failure mode: tool definitions expand until they crowd out the actual work.
Call it the MCP context window problem. Or schema bloat. Or prompt obesity. Whatever name you use, the fix is the same mindset: treat context like a scarce resource and design your tool layer around prompt economics, not around API completeness.
Use MCP when you need schemas and safety. Use CLIs or code execution when you need leverage and compactness. And above all, load tools dynamically.
If you’re documenting these decisions for your team, or publishing workflow guidance for other builders, consider using Junia.ai to turn your internal playbooks into clean technical articles you can actually maintain and ship.
