Why MCP Stopped Making Sense for Most Apps

The Core Problem

When Anthropic released MCP in late 2024, it solved a real problem: standardizing how LLMs connect to external tools. No more juggling different function-calling formats across providers.

But MCP introduced a bigger problem: context bloat.

Every MCP tool needs a detailed schema definition loaded into the model's context before you start. Name, description, parameters, types, response formats—all sitting there, consuming tokens, whether you use the tool or not.

Connect 10 MCP servers? You might burn 8,000-15,000 tokens just on tool definitions. Add your system prompt and custom instructions, and you've used a significant chunk of your context window before typing a word.

Worse, each tool invocation adds more: the tool call request, the response data, and the model's processing of that response. A few file reads and database queries later, you're deep into your context budget, and the model starts forgetting what you talked about three turns ago.

What Started Replacing MCP

Multi-Agent Architectures

Instead of one model with 30 tools loaded, modern systems use specialized agents:

A routing agent that delegates tasks
Focused worker agents that load only the tools they need
Context handoff that passes only relevant information between agents

A file-editing agent doesn't load database tools. A data-analysis agent doesn't load calendar integrations. Each operates in a clean context.

Code Execution Over Tool Calls

Anthropic's own research showed that agents scale better when they write code to call tools instead of using direct tool calls.

The pattern:

Agent discovers available tools (filesystem or registry)
Reads only needed tool definitions for current task
Generates code that invokes those tools
Code executes outside the model context

This can reduce context usage dramatically—from tens of thousands of tokens to a few thousand.

Prompt Chaining

Rather than one long conversation, modern workflows break tasks into focused steps with fresh context:

Analyze task, produce plan
Execute step 1 (fresh context, only relevant info)
Execute step 2 (fresh context, previous results)
Synthesize results

Memory lives in structured data passed between calls, not in an ever-growing chat history.

Dynamic Tool Loading

Load tools on demand instead of pre-loading everything:

Model receives task
Queries registry: "What tools handle file operations?"
Gets only relevant tool definitions
Executes with minimal context overhead
Releases tools after use

Where MCP Still Makes Sense

Despite the problems, MCP isn't dead. In fact, Anthropic donated it to the Linux Foundation's Agentic AI Foundation in December 2025.

MCP works well for:

Single-purpose tools: If your agent needs 2-3 carefully chosen tools for a specific task, MCP provides a clean standard interface without meaningful context overhead.

Worker agents in orchestrated systems: The specialist agents in multi-agent architectures often use MCP internally—they just don't load every tool at once.

Tool ecosystem standardization: MCP provides consistent authentication patterns, tool discovery, and marketplace integration across providers.

Desktop AI assistants: For personal productivity tools where context management is simpler and you want quick access to specific capabilities.

Prototyping and development: MCP makes it fast to wire up new tools without writing custom integration code. Once you understand what you need, you can optimize.

The Real Lesson

Context windows are finite. Every token you load upfront is a token you can't use for actual work.

The "load everything and let the model figure it out" approach hits scaling limits fast. Modern AI application architecture treats the LLM as one component in a system—a powerful reasoning engine that should be orchestrated, not overloaded.

MCP remains valuable as a standard. But treating it as your entire integration strategy leads to context exhaustion. Use it deliberately, not universally.