What is the 'context tax' in MCP toolchains and why does it matter?

The 'context tax' refers to the large amount of context window space consumed by loading tool schemas, descriptions, and metadata in MCP (Model-Connected Platforms) toolchains before any useful work is done. This overhead reduces the available context for actual problem-solving, leading to inefficiencies and degraded model performance. It's crucial because the context window acts as working capital for the model, so spending too much on tool definitions leaves less room for instructions, conversation, and reasoning.

Why are MCP toolchains uniquely vulnerable to schema bloat?

MCP toolchains often mirror API surfaces, resulting in many discrete servers each exporting numerous tools with complex JSON schemas. These schemas tend to be overly expressive—featuring deep nesting, large enums, verbose descriptions, and attempts to encode business logic—leading to 'schema bloat.' This bloating consumes excessive context tokens, making tool selection harder and increasing errors during model operation.

What problems arise from 'tool fanout' in MCP systems?

Tool fanout occurs when many integrations are added—such as Jira, Slack, GitHub, data warehouses, internal APIs—each adding its own set of tools and vocabularies. While individually small, together they overwhelm the agent's context window with too many options. For generalist agents this might be manageable, but for agents designed for narrow workflows, this leads to degraded decision-making quality and inefficient use of context space.

What are common symptoms indicating excessive context usage in MCP agents?

Typical symptoms include: the model appearing to get dumber mid-task as important instructions get pushed out by tool definitions; over-calling tools unnecessarily due to salient descriptions crowding out reasoning; under-calling tools because a long list overwhelms selection quality; increased retries and loops caused by mistakes; and higher latency from processing more tokens than necessary.

How can developers mitigate the context window issues caused by large MCP tool definitions?

Developers can adopt strategies such as dynamic tool loading (only loading relevant tools when needed), designing 'thin' servers with minimal schema complexity, avoiding one-tool-per-endpoint designs in favor of composable task-level tools, and considering CLI alternatives that reduce prompt payload sizes. These approaches help conserve context space for problem-solving rather than overhead.

When is MCP still the right abstraction despite its challenges?

MCP remains valuable when building agents that require broad integration capabilities across multiple discrete services or APIs. Its pattern enables powerful generalist agents capable of querying data, creating tickets, reading documentation, and hitting internal APIs seamlessly. With careful management of schema complexity and tool fanout, MCP can deliver significant utility without overwhelming the model's context window.

Mar 16 2026

The MCP Context Window Problem: Why Too Many Tools Can Cripple AI Agents

Thu

AI SEO Specialist, Full Stack Developer

MCP is having a moment.

You wire up a few MCP servers, point your agent at them, and suddenly it can do everything. Query data. Create tickets. Read docs. Hit internal APIs. It feels like you finally gave the model hands.

But there’s a very unsexy failure mode showing up in real systems, and a fresh Hacker News thread plus a good writeup made it impossible to ignore.

The failure mode is simple:

By the time your agent finishes loading tool schemas and “getting ready to work”, it has already burned a painful amount of its context window. Sometimes a majority of it. Before it has done anything useful.

If you build agents for real work, not demos, you should care. Context is your budget. Spend it on the problem, not on tool brochures.

This post breaks down what’s actually happening (context tax), why MCP toolchains are uniquely vulnerable (schema bloat and tool fanout), what to do about it (dynamic tool loading, “thin” servers, and yes, CLI alternatives), and when MCP is still absolutely the right abstraction.

Along the way I’ll reference the discussion that kicked this off, and an external post that lays out the CLI alternative clearly: MCP server eating context window: CLI alternative.

The context window is not “memory”. It’s working capital.

People still talk about context like it’s a long term memory slot. It’s not. It’s your scratchpad, your instructions, your tool definitions, your conversation, your retrieved docs, your intermediate reasoning. Everything competes for the same space.

So when you add tools, you are not just adding capability. You are adding overhead that has to be present in the prompt for the model to use those tools reliably.

That overhead is the “context tax”.

And with MCP, that tax can get huge because MCP encourages a certain shape of system:

lots of discrete servers
lots of functions per server
big JSON schemas so the model can call them correctly
descriptive text for safety and guardrails
and then, in many implementations, all of it gets shoved into the context at once

You can feel the issue immediately when an agent starts a task with a long preamble like “Available tools: …” and the tool list scrolls for pages. That is literal budget evaporation.

What actually gets injected? The hidden prompt payload

An MCP integration usually needs to provide the model with:

Tool names (and grouping, sometimes)
Natural language descriptions (often verbose)
Parameter schemas (JSON schema style, with nested objects, enums, optional fields)
Return type descriptions
Examples (sometimes)
Policies or constraints (sometimes copied for each tool)
Server metadata (auth, capabilities, versioning, etc, depending on framework)

Now multiply this by:

10 servers
each server exporting 20 to 80 tools
each tool schema being 1 KB to 8 KB of text once serialized into a prompt friendly format

That adds up fast. It is not weird to end up with tens of thousands of tokens of “tool definitions”.

And the truly annoying part is that this payload is often repeated every turn. Even if your runtime is “smart” about caching, many systems still end up reintroducing a lot of it due to how tool availability is represented to the model each step.

Schema bloat: the silent killer

Schema bloat is not just “too many tools”.

It’s the combination of:

overly expressive schemas (deep nesting, large enums, many optional branches)
verbose descriptions for every field
trying to encode business logic in tool docs
“one tool per endpoint” design, instead of task level tools

Tool calling works best when tools are few, composable, and easy to select between. But MCP server design often mirrors an API surface area, which is the opposite. APIs are built for computers. Toolsets are built for language models.

When you mirror an API, you get:

40 functions for CRUD and edge cases
overlapping capabilities
similar names
and tiny differences in parameters the model has to keep straight

So you pay context to explain all that. Then you pay again in mistakes when the model picks the wrong one anyway.

Tool fanout: MCP makes it easy to over integrate

MCP’s biggest selling point is also what gets teams into trouble.

Once you have the pattern, it’s addictive:

“Let’s add Jira.”
“Now Slack.”
“Now GitHub.”
“Now the data warehouse.”
“Now internal admin APIs.”
“Now the feature flag system.”

Each integration seems small on its own, but the agent sees the sum total. And the agent has to carry enough of that tool vocabulary in context to make good choices.

If your agent is a generalist, this might be fine. But if the agent is designed for a narrow workflow, tool fanout is self inflicted pain.

The practical symptoms you’ll see in production

When MCP context overhead gets out of control, teams usually don’t notice it as “token cost” first. They notice it as weird behavior:

The model gets dumber mid task. It starts strong, then loses the plot. This is often because earlier important instructions get pushed out of the window by tool definitions and retrieved docs.
It over calls tools. Not because tools are needed, but because tool descriptions are highly salient and crowd out “think first” instructions.
It under calls tools. The tool list is too long, so selection quality drops and the model defaults to text answers.
More retries, more loops. Because one wrong call forces a correction, which forces more context, which makes the next call worse.
Higher latency. More tokens in, more tokens processed. Also more tool consideration time if your agent does routing.
Costs that scale with integrations, not with work. You pay for the tool payload even when the task is simple.

None of this is theoretical. If you’ve watched traces from an agent framework, you’ve probably seen it.

“But we have a big context model now.” Still not free.

Yes, context windows are bigger than they used to be. But the logic remains:

larger windows cost more
larger windows increase the temptation to stuff more in
larger windows can hide sloppy prompt architecture until you hit a new ceiling
and larger windows do not fix attention dilution. Having 200 pages in front of you does not mean you read the right paragraph.

You can absolutely brute force some of this with a bigger model. A lot of teams do. The bill shows up later.

Dynamic tool loading: the most important mitigation

If you do one thing, do this: do not load all tools all the time.

Instead, load tools dynamically based on the task stage. There are a few patterns:

1) Two step routing (cheap planner, then tool loading)

Step A: run a small “router” prompt that decides which tool groups are needed (Jira vs GitHub vs DB vs none)
Step B: only inject the schemas for those groups into the main agent

This can cut prompt size drastically. And it improves tool choice because the model is selecting from a smaller menu.

2) Progressive disclosure

Start with zero tools, then:

attempt a solution
if blocked, ask “Do I need external action?”
then load only the tool family that unblocks you

This looks slower in steps, but it often wins overall because each step is cheaper and more reliable.

3) Tool indexing (describe tools outside the context)

Instead of injecting full schemas, inject an index:

tool name
one line summary
cost/risk hints

Then have a meta tool like load_tool_schema(tool_name) that fetches the full schema only when needed.

This works especially well if your agent platform supports “tool retrieval” like document retrieval.

Design “thin” MCP servers, not “API mirrors”

If your MCP server exports 80 endpoints, you probably built an API mirror.

Try building task level tools instead. Some examples:

Bad: jira_create_issue, jira_update_issue, jira_add_comment, jira_transition_issue, jira_search_jql…
Better: jira_handle_ticket(action, summary, context) where action is one of 5 things and the server does the internal orchestration

This feels like “hiding power”, but you’re not hiding it. You’re packaging it for the model. The server can still do the complicated part deterministically, with validation and guardrails, without forcing the model to juggle 30 parameters.

This also reduces schema surface area, which reduces context tax.

The CLI alternative: why it can be cheaper and more robust

The blog post I linked earlier argues something that makes a lot of sense in practice: for many developer workflows, a CLI tool can be a better integration layer than MCP.

Here’s the core idea:

Instead of injecting huge JSON schemas, you provide one tool like:

run_command(command: string)

or slightly safer:

run_command(command: string, cwd: string, timeout: number)

Then the model uses the CLI’s own help text, error messages, and composable commands to do real work.

Why this helps:

One tool schema instead of hundreds
CLI UX is already optimized for succinctness and composability
output is often compact and structured enough
commands can be cached, repeated, diffed
you can sandbox execution tightly (containers, read only FS, allowlisted binaries)

There are real tradeoffs of course (security, injection risk, nondeterminism if you allow arbitrary commands), but the shape is compelling: the model doesn’t need a dictionary of every operation. It needs one way to execute deterministic programs.

A good middle ground is a constrained CLI runner: allowlisted commands only, with structured arguments, and a fixed working directory.

What about code execution as a tool?

Code execution is another alternative that often beats MCP for certain categories of work:

data transforms
parsing
summarization of long outputs
quick calculations
generating structured artifacts (JSON, CSV, diffs)

Again the reason is prompt economics. One “execute Python” tool can replace a hundred “data utilities” tools and avoids huge schema payloads. Plus you get correctness gains because the machine runs the code.

But you still need to manage context. The code and outputs can get large too. So you want patterns like:

store artifacts out of band (files, object storage)
return only summaries or hashes to the model
reference artifacts by ID

Which teams should care the most?

Some teams can ignore this for a while. Others will feel it immediately.

You should take the MCP context window problem seriously if you are:

Building agents with long multi step workflows, where earlier constraints must remain in memory (compliance, policy, user preferences).
Operating at scale, where per run overhead multiplies into real spend.
Using models with smaller context windows for cost reasons, or using smaller models for routing.
Integrating lots of enterprise systems, where each integration comes with lots of endpoints and permissions.
Shipping productized agents, where latency and reliability are features, not engineering curiosities.

If you’re prototyping an internal assistant that runs twice a day, you can absolutely overstuff the prompt and move on. Just know what you’re paying for.

Where MCP still shines (and it’s not just hype)

This post is not “MCP bad”. MCP is legitimately useful. It shines when:

1) You need a standard contract across many tools

MCP gives you a uniform way to expose capabilities, auth, and metadata. For organizations with many internal systems, that standardization is gold.

2) You want better safety than raw execution

Compared to “run arbitrary command”, a well designed MCP server can:

validate inputs
enforce permissions
constrain actions
log every operation
implement rate limits and auditing

This is especially valuable for customer facing agents.

3) You have non developer tool providers

If other teams can publish MCP servers without coordinating on bespoke integrations, you unlock velocity. That’s the platform story.

4) Your tasks truly require many heterogeneous actions

Some workflows legitimately bounce across systems. If the agent is a generalist operator, having a broad toolbelt can be correct.

The trick is you still need prompt discipline. MCP makes integration easier. It does not make context free.

A practical decision framework: MCP vs CLI vs code exec

Here’s a simple way to choose.

Choose MCP when…

You need strong guardrails and explicit schemas
Actions are high risk (billing, deletion, production changes)
You want a stable API like contract for tools
You can keep tool surfaces small and task oriented
You can implement dynamic tool loading

Choose CLI when…

The domain already has great CLIs (git, kubectl, terraform, ripgrep, jq)
You want maximum leverage with minimal schema
You can sandbox it properly
Your users are developers and can tolerate “terminal shaped” workflows

Choose code execution when…

The work is mostly computation, transformation, parsing
Determinism and correctness matter more than natural language reasoning
You can externalize artifacts and keep outputs compact

Most mature systems end up hybrid. MCP for high risk actions. CLI for developer ergonomics. Code exec for data work. And a retrieval layer for docs.

How to reduce MCP overhead without abandoning it

If you already committed to MCP, here are concrete tactics that help quickly:

Audit your tool list: count tools, average schema length, and how often each tool is used. You’ll find dead weight.
Collapse similar tools into fewer task level tools.
Move verbose documentation out of the prompt and into retrievable docs.
Add tool grouping and only load groups when needed.
Add a router model step to select tool families.
Prefer short enums and avoid giant parameter objects unless necessary.
Return concise outputs from tools. Don’t dump entire records unless asked. Add pagination and summarization server side.
Cache tool schemas in your agent runtime so you’re not re serializing huge text blobs every turn.
Measure context tax explicitly: log prompt token counts by category (instructions vs tools vs retrieved docs vs conversation).

That last one matters more than it sounds. If you don’t measure it, people will keep “just adding one more integration” until the agent quietly degrades.

One more angle: context tax is also an ops and documentation problem

A lot of MCP bloat comes from good intentions.

Someone adds careful field descriptions. Another person adds examples. Another adds policy reminders. Soon every tool reads like a mini README.

But that content belongs in a place where humans can read it and models can retrieve it when relevant. Not shoved into every single step.

This is where good technical writing habits matter. Tool docs should be modular. Queryable. Versioned. Easy to update.

And honestly, publishable.

If you’re already writing down patterns for “how we design tools”, “how we keep schemas small”, “when to use MCP vs CLI”, you should turn those notes into real internal docs or even public guidance.

Junia AI is useful here in a pretty practical way: you can take messy engineering notes and turn them into structured, searchable posts, then keep them updated as your platform evolves. If you’re doing this kind of work regularly, their AI text editor helps clean up and maintain technical docs without rewriting everything from scratch: AI text editor. And if you’re publishing a lot of internal knowledge, automated AI internal linking makes those docs easier to navigate over time: AI internal linking.

Wrap up

MCP is a powerful abstraction. It also has a predictable failure mode: tool definitions expand until they crowd out the actual work.

Call it the MCP context window problem. Or schema bloat. Or prompt obesity. Whatever name you use, the fix is the same mindset: treat context like a scarce resource and design your tool layer around prompt economics, not around API completeness.

Use MCP when you need schemas and safety. Use CLIs or code execution when you need leverage and compactness. And above all, load tools dynamically.

If you’re documenting these decisions for your team, or publishing workflow guidance for other builders, consider using Junia.ai to turn your internal playbooks into clean technical articles you can actually maintain and ship.