Jump to section

Understanding MCP

The Model Context Protocol (MCP) is an open, JSON-RPC-based standard that serves as a universal adapter between large language models (LLMs) and external data sources or capabilities.

Remote MCP server

At Stainless, we generate client SDKs in various languages for our customers' APIs. Recently, those same customers started asking us for something new: a Model Context Protocol (MCP) server that wraps their API and exposes it as a set of tools for LLMs.

This article explains what a remote MCP server is, how it works, and how to build one that can scale with your needs.

What is a remote MCP server?

Model Context Protocol (MCP) is a specification that defines how AI models can interact with external tools. These tools are exposed as HTTP endpoints with JSON schemas that describe what parameters they accept and what they return.

A remote MCP server hosts one or more MCP-compatible tools over the internet. Unlike local MCP servers that run on a user's machine, remote servers are deployed to accessible URLs. This setup allows tools to be shared across multiple users and applications.

MCP architecture includes three main components:

The host: The environment where the AI model runs (like Anthropic or OpenAI)
The client: The runtime that manages connections between hosts and servers
The server: The component that exposes tools the model can use

Companies like GitHub, Cloudflare, and Zapier have built remote MCP servers to make their services available to AI models. These servers centralize tool access and make it easier to apply authentication and monitoring in one place.

Why scalability matters for MCP servers

Remote MCP servers receive tool requests from AI clients, often in parallel. Unlike traditional APIs with predictable patterns, AI agents might call multiple tools in sequence or trigger batch operations unexpectedly.

When your MCP server handles complex operations—like file processing or multi-step workflows—each request uses more CPU, memory, and network resources. If multiple agents call the same tool at once, it can cause delays or timeouts.

For example, a customer support AI might need to check order status, look up shipping details, and update customer records all within seconds. Each of these operations requires a separate tool call, and each call needs to complete quickly to maintain a smooth conversation.

Adding more tools to your server increases its memory footprint. If an API has 200 endpoints exposed through MCP, the server needs to manage all these definitions unless they're dynamically filtered.

Key steps to deploy a remote MCP server

1. Set up your container environment

Containers package your MCP server with everything it needs to run consistently. Using Docker makes deployment simpler:

FROM node:20-alpine
WORKDIR /app
COPY . .
RUN npm install --production
CMD ["node", "server.js"]

Build and run the container locally to test:

docker build -t my-mcp-server .
docker run -p 8787

This exposes your MCP server on port 8787, ready for local testing before deployment.

2. Configure OAuth and secrets

Remote MCP servers typically need authentication. OAuth 2.0 lets clients request access tokens to interact with tools securely.

export OAUTH_CLIENT_ID="your-client-id"
export OAUTH_CLIENT_SECRET="your-client-secret"

Never commit these values to version control. Instead, use environment variables or a secret management service.

3. Deploy to a hosting provider

Popular options for hosting remote MCP servers include:

Provider	Pros	Cons
Cloudflare Workers	Low latency, auto-scaling	Limited runtime
AWS Lambda	Custom runtimes, integration with AWS services	Cold starts
Traditional VMs	Full control	Manual scaling

Cloudflare offers templates specifically for MCP servers, making it one of the easiest options to get started with.

4. Test connectivity

After deployment, test your server using an MCP-compatible client. The MCP Inspector tool works well for this:

Open the Inspector in your browser and connect to your server URL. If you can list tools and execute them, your server is working correctly.

Managing large OpenAPI specs

1. Resolve references for MCP compatibility

OpenAPI specs often use $ref to point to reusable components. However, MCP tool schemas must be self-contained with no external references.

Here's how a reference gets resolved:

Original schema:

{
  "properties": {
    "user": { "$ref": "#/components/schemas/User" }
  }
}

Resolved schema:

{
  "properties": {
    "user": {
      "type": "object",
      "properties": {
        "id": { "type": "string" },
        "email": { "type": "string" }
      }
    }
  }
}

This process, called inlining, makes the schema compatible with MCP clients but can increase its size significantly.

2. Split endpoints by context

Large APIs often have hundreds of endpoints. Grouping them into logical sets makes them easier to manage as MCP tools:

Resource-based groups: customers, orders, products
Function-based groups: reporting, billing, authentication
Role-based groups: adminTools, userTools, supportTools

Each group can be exposed as a separate MCP tool or set of tools. This approach simplifies tool selection and reduces the size of each schema.

3. Handle parameter collisions

When combining parameters from different parts of an API endpoint (path, query, headers, body), name conflicts can occur. For example, both the path and query might use a parameter called id.

To avoid confusion:

Prefix parameters based on their source (path_id, query_id)
Use fully qualified names for nested objects
Apply consistent naming conventions across all tools

This clarity helps both the AI model and developers understand what each parameter does.

Popular remote MCP server integrations

Many companies now offer remote MCP servers that expose their services as tools for AI models:

Anthropic and OpenAI

Both Anthropic and OpenAI support the MCP protocol, allowing their models (Claude and GPT) to interact with remote servers. Anthropic uses the MCP connector to establish these connections.

GitHub and Cloudflare

GitHub's remote MCP server lets AI clients access repository data like issues, pull requests, and code files. It supports both OAuth and personal access tokens for authentication.

Cloudflare provides templates and infrastructure for hosting your own MCP servers. Their Workers platform makes it easy to deploy and scale servers without managing traditional infrastructure.

E-commerce and communication tools

Several platforms offer specialized tools through MCP:

Stripe: Payment processing, subscription management
Shopify: Product catalog, order management
Twilio: Messaging, voice calls, programmable workflows
Square: Point-of-sale, inventory tracking

Automation and support Platforms

Zapier: Connects to thousands of other services
Plaid: Financial account access and analysis
Intercom: Customer conversations and support tickets
Workato: Enterprise workflow automation

Each integration requires proper authentication and schema definition to work correctly with AI models.

Scaling your MCP server

As your MCP server usage grows, you'll need strategies to handle increased load:

Monitor performance metrics: Track response times, request volume, and error rates to identify bottlenecks.
Implement caching: Cache frequently used tool results to reduce processing time and API calls.
Use horizontal scaling: Run multiple instances of your server behind a load balancer to distribute traffic.
Apply rate limiting: Prevent abuse by limiting how often clients can call your tools.
Optimize schemas: Keep tool definitions concise and well-structured to reduce parsing overhead.

The quality of your API's SDKs directly affects how well your MCP server performs. At Stainless, we generate high-quality SDKs from OpenAPI specifications, which helps ensure consistent and reliable MCP server behavior.

Ready to build your own remote MCP server? Get started with Stainless to ensure your implementation meets enterprise standards.