MCP Python SDK – Model Context Protocol Clients and Servers

MCP Python SDK lets you build clients and servers for the Model Context Protocol. Add tools, manage context, and integrate with any MCP-compatible LLM.

Jump to section

Jump to section

Jump to section

Building an MCP server that works in development is straightforward. Building one that handles production traffic, scales with your API, and doesn't become a security liability requires different patterns entirely.

This guide covers the architecture decisions, security patterns, and operational practices we've learned from deploying MCP servers at scale. You'll learn how to structure tools for large APIs, implement proper authentication and rate limiting, choose the right deployment target, and avoid the common pitfalls that can turn your MCP server into a maintenance burden.
This article provides a practical guide for building production-ready MCP servers with Python. You will learn about core architecture, scalable tool design, security patterns, deployment strategies, and operational best practices.

  • Understand Architecture: Learn how clients, servers, and transport layers like stdio and HTTP work together.

  • Design Scalable Tools: Implement robust error handling, input validation, dynamic tools for large APIs, and caching.

  • Secure Your Server: Apply production-grade security patterns including OAuth, rate limiting, and auditing.

  • Deploy and Operate: Compare deployment targets like Docker and Cloudflare Workers, and learn to monitor and roll out updates safely.

What is the MCP Python SDK?

The MCP Python SDK is a library for building servers and clients that speak the Model Context Protocol. MCP standardizes how large language models (LLMs) discover and use external APIs, making it a portable alternative to vendor-specific solutions like OpenAI's function calling. By building an MCP server, you create a universal adapter that any compatible client, from Claude to Cursor, can use to call your API.

Note: The following code uses the third-party fastmcp library, which is not part of the official Stainless toolchain. If you need an MCP server generated from your OpenAPI spec via Stainless, see the TypeScript/Node instructions in the Stainless documentation.

A library like FastMCP makes creating a basic server simple.

from fastmcp import FastMCP

app = FastMCP("my-api-server")

@app.tool()
def get_user(user_id: str) -> dict:
    """Fetches a user by their ID."""
    # ... logic to fetch user from a database or another API
    return {"id": user_id, "name": "Jane Doe"}

Understand production architecture

Before deploying, it's crucial to understand how the pieces fit together. An MCP server acts as a middleman, translating requests from an LLM client into calls to your actual backend API. This flow is orchestrated through a few key primitives.

  • Tools: These are the actions an LLM can take, like get_user. Each tool has a name, description, and an input schema.

  • Resources: This is static data you want to make available to the model, like the contents of a file.

  • Prompts: These are reusable, user-controlled templates for common tasks, often surfaced as slash commands in a client.

Communication between the client and server happens over a transport layer. For local development, stdio is simple, piping data through standard input and output. For remote deployments, you'll use streaming HTTP or Server-Sent Events (SSE), which are better for web-based clients.

Map primitives to API design

Think of your existing REST endpoints as a blueprint for your MCP tools. A GET /users/{id} endpoint naturally maps to a get_user(id) tool. This direct mapping is often the fastest way to get started, and understanding the transition from API to MCP can help automate this process with a code generator.

Separate business logic from transport

Your MCP server should be a thin translation layer. Keep your core business logic in your primary application, whether it's a FastAPI or Django app. The MCP handlers, decorated with @app.tool(), should simply validate inputs and delegate the actual work to your existing service layer.

Design scalable tools

As your API grows, the design of your tools becomes critical for performance and usability. Building for production requires thinking about failure modes, bad data, and scale.

Handle errors

Your server will inevitably encounter errors. Instead of letting your server crash, catch exceptions within your tool handlers and return a structured error message to the LLM. For transient network issues, your backend API can send an X-Should-Retry: true header to instruct the client's SDK to attempt the request again.

Validate input

Never trust input from any client, including an LLM. Use Pydantic models within your tool functions to automatically validate the types and structure of incoming arguments. This prevents malformed data from reaching your downstream services.

Add dynamic tools

For APIs with hundreds of endpoints, loading every tool into the LLM's context window is impractical. The experiences learned from converting complex OpenAPI specs to MCP servers demonstrate how to handle such scale effectively. A better approach is to use dynamic tools, which allow the LLM to discover endpoints at runtime. You can enable this with a simple configuration flag, exposing just three meta-tools: list_api_endpoints, get_api_endpoint_schema, and invoke_api_endpoint.

Cache expensive calls

Some operations, like listing all available tools or fetching schemas, can be slow. To improve performance, cache the results of these calls. An in-memory cache works for simple cases, while a shared cache like Redis is better for multi-instance deployments.

Secure production servers

Security is a core requirement for any production service. An MCP server is an open door to your API, and you need to be deliberate about who can open it and what they can do.

Implement Oauth

For remote servers used by web applications, API keys are not a secure authentication method. Instead, implement an OAuth 2.0 flow. A common pattern is to deploy a Cloudflare Worker that handles the token exchange and securely obtains user credentials during the consent screen.

Enforce rate limits

To protect your backend services from abuse or runaway LLM loops, implement rate limiting. A token bucket algorithm is a common choice, and it can be implemented as middleware in your server. For distributed systems, use a centralized store like Redis to share rate limit state across all server instances.

Limit tool exposure

You may not want to expose every single API endpoint to the LLM, especially destructive or sensitive ones. Use configuration tags to group tools (e.g., read-only, admin) and allow users to select which groups to load.

Audit requests

Log every tool invocation. Structured logs containing the tool name, arguments, and authenticated user are invaluable for debugging and security audits. Forward these logs to a service like Datadog or an OpenTelemetry collector for analysis and alerting.

Deploy and operate servers

Writing the code is only half the battle. A production-grade service needs a reliable deployment pipeline, monitoring, and a safe process for rolling out updates.

Deployment Target

Best For

Key Consideration

Docker (ECS/K8s)

Complex applications with multiple services

Full control over the environment, but more complex to manage.

Cloudflare Workers

Edge-first, low-latency APIs

Serverless model simplifies scaling, but has execution limits.

Serverless Functions

Simple, single-purpose servers

Pay-per-use model is cost-effective for low traffic.

Package with Docker

Containerizing your server with Docker is the most portable way to deploy it. The Stainless SDK generator can automatically create a Dockerfile and CI/CD workflow to build and push an image to a registry like Docker Hub or GHCR.

Ship to the edge

For the lowest possible latency, deploy your server to the edge using a platform like Cloudflare Workers. The Stainless SDK generator can scaffold a complete, deployable worker project. Once configured, you can deploy globally with a single wrangler deploy command.

Monitor health

Your server should expose a /health endpoint that monitoring systems can check for readiness. Integrate with tools like Prometheus to scrape metrics on request latency, error rates, and tool usage, and use Grafana to visualize this data.

Roll out safely

Use semantic versioning for your server package. When you merge a release pull request, your CI/CD system should automatically publish the new version to PyPI. For critical services, use canary deployments to gradually roll out the new version to a small subset of users.

Apply field patterns

Over time, you'll develop an intuition for what makes a great MCP server. Here are a few patterns we've seen work well in the field.

Optimize performance

Python's asyncio is essential for building high-throughput servers that can handle many concurrent connections. Use asynchronous libraries for all I/O-bound operations, like making HTTP requests or querying a database.

Test end to end

Unit tests are great, but they can't catch everything. Write end-to-end tests that spin up a real instance of your MCP server and use an actual MCP client to make calls. Tools like pytest and its fixture model make it easy to manage the lifecycle of these test resources.

Version tools

As your API evolves, your tool schemas will need to change. To avoid breaking existing clients, make schema changes backward-compatible whenever possible. Use client capability flags to allow newer clients to opt into features that older clients don't support.

Avoid common pitfalls

Here are a few common mistakes to watch out for:

  • Leaking internal errors: Never return raw stack traces or internal error messages to the client.

  • Overloading the context window: Keep tool descriptions concise and schemas focused.

  • Ignoring idempotency: For tools that create or update data, support an idempotency key to prevent duplicate operations from retries.

Ready to build your own production-grade MCP server? Get started for free at https://app.stainless.com/signup.

Frequently asked questions about production MCP servers

What is the difference between MCP and OpenAI function calling?

MCP is an open, portable protocol supported by multiple clients, whereas OpenAI's function calling is a proprietary feature specific to their models. An MCP server can be used by any compatible LLM, offering greater flexibility.

Can I connect an MCP server to OpenAI agents?

Yes, you can connect an MCP server to OpenAI agents by using an adapter library or by configuring your server to emit schemas that are compatible with OpenAI's specific limitations.

How do I handle API rate limits in an MCP server?

Your MCP server should respect the rate limits of your backend API by implementing its own throttling logic, often using a token bucket algorithm with exponential backoff on retries.

Should I deploy one server per API or combine multiple apis?

For simplicity and clear security boundaries, it's usually best to deploy one MCP server per backend API. Combining multiple APIs into a single server can be complex and may inadvertently expose tools.

How do I migrate existing function-calling code to mcp?

Start by generating an MCP server from an OpenAPI spec, which creates a tool for each endpoint. Then, you can incrementally refactor your existing function-calling logic into custom tool handlers within the new MCP server.

Building an MCP server that works in development is straightforward. Building one that handles production traffic, scales with your API, and doesn't become a security liability requires different patterns entirely.

This guide covers the architecture decisions, security patterns, and operational practices we've learned from deploying MCP servers at scale. You'll learn how to structure tools for large APIs, implement proper authentication and rate limiting, choose the right deployment target, and avoid the common pitfalls that can turn your MCP server into a maintenance burden.
This article provides a practical guide for building production-ready MCP servers with Python. You will learn about core architecture, scalable tool design, security patterns, deployment strategies, and operational best practices.

  • Understand Architecture: Learn how clients, servers, and transport layers like stdio and HTTP work together.

  • Design Scalable Tools: Implement robust error handling, input validation, dynamic tools for large APIs, and caching.

  • Secure Your Server: Apply production-grade security patterns including OAuth, rate limiting, and auditing.

  • Deploy and Operate: Compare deployment targets like Docker and Cloudflare Workers, and learn to monitor and roll out updates safely.

What is the MCP Python SDK?

The MCP Python SDK is a library for building servers and clients that speak the Model Context Protocol. MCP standardizes how large language models (LLMs) discover and use external APIs, making it a portable alternative to vendor-specific solutions like OpenAI's function calling. By building an MCP server, you create a universal adapter that any compatible client, from Claude to Cursor, can use to call your API.

Note: The following code uses the third-party fastmcp library, which is not part of the official Stainless toolchain. If you need an MCP server generated from your OpenAPI spec via Stainless, see the TypeScript/Node instructions in the Stainless documentation.

A library like FastMCP makes creating a basic server simple.

from fastmcp import FastMCP

app = FastMCP("my-api-server")

@app.tool()
def get_user(user_id: str) -> dict:
    """Fetches a user by their ID."""
    # ... logic to fetch user from a database or another API
    return {"id": user_id, "name": "Jane Doe"}

Understand production architecture

Before deploying, it's crucial to understand how the pieces fit together. An MCP server acts as a middleman, translating requests from an LLM client into calls to your actual backend API. This flow is orchestrated through a few key primitives.

  • Tools: These are the actions an LLM can take, like get_user. Each tool has a name, description, and an input schema.

  • Resources: This is static data you want to make available to the model, like the contents of a file.

  • Prompts: These are reusable, user-controlled templates for common tasks, often surfaced as slash commands in a client.

Communication between the client and server happens over a transport layer. For local development, stdio is simple, piping data through standard input and output. For remote deployments, you'll use streaming HTTP or Server-Sent Events (SSE), which are better for web-based clients.

Map primitives to API design

Think of your existing REST endpoints as a blueprint for your MCP tools. A GET /users/{id} endpoint naturally maps to a get_user(id) tool. This direct mapping is often the fastest way to get started, and understanding the transition from API to MCP can help automate this process with a code generator.

Separate business logic from transport

Your MCP server should be a thin translation layer. Keep your core business logic in your primary application, whether it's a FastAPI or Django app. The MCP handlers, decorated with @app.tool(), should simply validate inputs and delegate the actual work to your existing service layer.

Design scalable tools

As your API grows, the design of your tools becomes critical for performance and usability. Building for production requires thinking about failure modes, bad data, and scale.

Handle errors

Your server will inevitably encounter errors. Instead of letting your server crash, catch exceptions within your tool handlers and return a structured error message to the LLM. For transient network issues, your backend API can send an X-Should-Retry: true header to instruct the client's SDK to attempt the request again.

Validate input

Never trust input from any client, including an LLM. Use Pydantic models within your tool functions to automatically validate the types and structure of incoming arguments. This prevents malformed data from reaching your downstream services.

Add dynamic tools

For APIs with hundreds of endpoints, loading every tool into the LLM's context window is impractical. The experiences learned from converting complex OpenAPI specs to MCP servers demonstrate how to handle such scale effectively. A better approach is to use dynamic tools, which allow the LLM to discover endpoints at runtime. You can enable this with a simple configuration flag, exposing just three meta-tools: list_api_endpoints, get_api_endpoint_schema, and invoke_api_endpoint.

Cache expensive calls

Some operations, like listing all available tools or fetching schemas, can be slow. To improve performance, cache the results of these calls. An in-memory cache works for simple cases, while a shared cache like Redis is better for multi-instance deployments.

Secure production servers

Security is a core requirement for any production service. An MCP server is an open door to your API, and you need to be deliberate about who can open it and what they can do.

Implement Oauth

For remote servers used by web applications, API keys are not a secure authentication method. Instead, implement an OAuth 2.0 flow. A common pattern is to deploy a Cloudflare Worker that handles the token exchange and securely obtains user credentials during the consent screen.

Enforce rate limits

To protect your backend services from abuse or runaway LLM loops, implement rate limiting. A token bucket algorithm is a common choice, and it can be implemented as middleware in your server. For distributed systems, use a centralized store like Redis to share rate limit state across all server instances.

Limit tool exposure

You may not want to expose every single API endpoint to the LLM, especially destructive or sensitive ones. Use configuration tags to group tools (e.g., read-only, admin) and allow users to select which groups to load.

Audit requests

Log every tool invocation. Structured logs containing the tool name, arguments, and authenticated user are invaluable for debugging and security audits. Forward these logs to a service like Datadog or an OpenTelemetry collector for analysis and alerting.

Deploy and operate servers

Writing the code is only half the battle. A production-grade service needs a reliable deployment pipeline, monitoring, and a safe process for rolling out updates.

Deployment Target

Best For

Key Consideration

Docker (ECS/K8s)

Complex applications with multiple services

Full control over the environment, but more complex to manage.

Cloudflare Workers

Edge-first, low-latency APIs

Serverless model simplifies scaling, but has execution limits.

Serverless Functions

Simple, single-purpose servers

Pay-per-use model is cost-effective for low traffic.

Package with Docker

Containerizing your server with Docker is the most portable way to deploy it. The Stainless SDK generator can automatically create a Dockerfile and CI/CD workflow to build and push an image to a registry like Docker Hub or GHCR.

Ship to the edge

For the lowest possible latency, deploy your server to the edge using a platform like Cloudflare Workers. The Stainless SDK generator can scaffold a complete, deployable worker project. Once configured, you can deploy globally with a single wrangler deploy command.

Monitor health

Your server should expose a /health endpoint that monitoring systems can check for readiness. Integrate with tools like Prometheus to scrape metrics on request latency, error rates, and tool usage, and use Grafana to visualize this data.

Roll out safely

Use semantic versioning for your server package. When you merge a release pull request, your CI/CD system should automatically publish the new version to PyPI. For critical services, use canary deployments to gradually roll out the new version to a small subset of users.

Apply field patterns

Over time, you'll develop an intuition for what makes a great MCP server. Here are a few patterns we've seen work well in the field.

Optimize performance

Python's asyncio is essential for building high-throughput servers that can handle many concurrent connections. Use asynchronous libraries for all I/O-bound operations, like making HTTP requests or querying a database.

Test end to end

Unit tests are great, but they can't catch everything. Write end-to-end tests that spin up a real instance of your MCP server and use an actual MCP client to make calls. Tools like pytest and its fixture model make it easy to manage the lifecycle of these test resources.

Version tools

As your API evolves, your tool schemas will need to change. To avoid breaking existing clients, make schema changes backward-compatible whenever possible. Use client capability flags to allow newer clients to opt into features that older clients don't support.

Avoid common pitfalls

Here are a few common mistakes to watch out for:

  • Leaking internal errors: Never return raw stack traces or internal error messages to the client.

  • Overloading the context window: Keep tool descriptions concise and schemas focused.

  • Ignoring idempotency: For tools that create or update data, support an idempotency key to prevent duplicate operations from retries.

Ready to build your own production-grade MCP server? Get started for free at https://app.stainless.com/signup.

Frequently asked questions about production MCP servers

What is the difference between MCP and OpenAI function calling?

MCP is an open, portable protocol supported by multiple clients, whereas OpenAI's function calling is a proprietary feature specific to their models. An MCP server can be used by any compatible LLM, offering greater flexibility.

Can I connect an MCP server to OpenAI agents?

Yes, you can connect an MCP server to OpenAI agents by using an adapter library or by configuring your server to emit schemas that are compatible with OpenAI's specific limitations.

How do I handle API rate limits in an MCP server?

Your MCP server should respect the rate limits of your backend API by implementing its own throttling logic, often using a token bucket algorithm with exponential backoff on retries.

Should I deploy one server per API or combine multiple apis?

For simplicity and clear security boundaries, it's usually best to deploy one MCP server per backend API. Combining multiple APIs into a single server can be complex and may inadvertently expose tools.

How do I migrate existing function-calling code to mcp?

Start by generating an MCP server from an OpenAPI spec, which creates a tool for each endpoint. Then, you can incrementally refactor your existing function-calling logic into custom tool handlers within the new MCP server.

Building an MCP server that works in development is straightforward. Building one that handles production traffic, scales with your API, and doesn't become a security liability requires different patterns entirely.

This guide covers the architecture decisions, security patterns, and operational practices we've learned from deploying MCP servers at scale. You'll learn how to structure tools for large APIs, implement proper authentication and rate limiting, choose the right deployment target, and avoid the common pitfalls that can turn your MCP server into a maintenance burden.
This article provides a practical guide for building production-ready MCP servers with Python. You will learn about core architecture, scalable tool design, security patterns, deployment strategies, and operational best practices.

  • Understand Architecture: Learn how clients, servers, and transport layers like stdio and HTTP work together.

  • Design Scalable Tools: Implement robust error handling, input validation, dynamic tools for large APIs, and caching.

  • Secure Your Server: Apply production-grade security patterns including OAuth, rate limiting, and auditing.

  • Deploy and Operate: Compare deployment targets like Docker and Cloudflare Workers, and learn to monitor and roll out updates safely.

What is the MCP Python SDK?

The MCP Python SDK is a library for building servers and clients that speak the Model Context Protocol. MCP standardizes how large language models (LLMs) discover and use external APIs, making it a portable alternative to vendor-specific solutions like OpenAI's function calling. By building an MCP server, you create a universal adapter that any compatible client, from Claude to Cursor, can use to call your API.

Note: The following code uses the third-party fastmcp library, which is not part of the official Stainless toolchain. If you need an MCP server generated from your OpenAPI spec via Stainless, see the TypeScript/Node instructions in the Stainless documentation.

A library like FastMCP makes creating a basic server simple.

from fastmcp import FastMCP

app = FastMCP("my-api-server")

@app.tool()
def get_user(user_id: str) -> dict:
    """Fetches a user by their ID."""
    # ... logic to fetch user from a database or another API
    return {"id": user_id, "name": "Jane Doe"}

Understand production architecture

Before deploying, it's crucial to understand how the pieces fit together. An MCP server acts as a middleman, translating requests from an LLM client into calls to your actual backend API. This flow is orchestrated through a few key primitives.

  • Tools: These are the actions an LLM can take, like get_user. Each tool has a name, description, and an input schema.

  • Resources: This is static data you want to make available to the model, like the contents of a file.

  • Prompts: These are reusable, user-controlled templates for common tasks, often surfaced as slash commands in a client.

Communication between the client and server happens over a transport layer. For local development, stdio is simple, piping data through standard input and output. For remote deployments, you'll use streaming HTTP or Server-Sent Events (SSE), which are better for web-based clients.

Map primitives to API design

Think of your existing REST endpoints as a blueprint for your MCP tools. A GET /users/{id} endpoint naturally maps to a get_user(id) tool. This direct mapping is often the fastest way to get started, and understanding the transition from API to MCP can help automate this process with a code generator.

Separate business logic from transport

Your MCP server should be a thin translation layer. Keep your core business logic in your primary application, whether it's a FastAPI or Django app. The MCP handlers, decorated with @app.tool(), should simply validate inputs and delegate the actual work to your existing service layer.

Design scalable tools

As your API grows, the design of your tools becomes critical for performance and usability. Building for production requires thinking about failure modes, bad data, and scale.

Handle errors

Your server will inevitably encounter errors. Instead of letting your server crash, catch exceptions within your tool handlers and return a structured error message to the LLM. For transient network issues, your backend API can send an X-Should-Retry: true header to instruct the client's SDK to attempt the request again.

Validate input

Never trust input from any client, including an LLM. Use Pydantic models within your tool functions to automatically validate the types and structure of incoming arguments. This prevents malformed data from reaching your downstream services.

Add dynamic tools

For APIs with hundreds of endpoints, loading every tool into the LLM's context window is impractical. The experiences learned from converting complex OpenAPI specs to MCP servers demonstrate how to handle such scale effectively. A better approach is to use dynamic tools, which allow the LLM to discover endpoints at runtime. You can enable this with a simple configuration flag, exposing just three meta-tools: list_api_endpoints, get_api_endpoint_schema, and invoke_api_endpoint.

Cache expensive calls

Some operations, like listing all available tools or fetching schemas, can be slow. To improve performance, cache the results of these calls. An in-memory cache works for simple cases, while a shared cache like Redis is better for multi-instance deployments.

Secure production servers

Security is a core requirement for any production service. An MCP server is an open door to your API, and you need to be deliberate about who can open it and what they can do.

Implement Oauth

For remote servers used by web applications, API keys are not a secure authentication method. Instead, implement an OAuth 2.0 flow. A common pattern is to deploy a Cloudflare Worker that handles the token exchange and securely obtains user credentials during the consent screen.

Enforce rate limits

To protect your backend services from abuse or runaway LLM loops, implement rate limiting. A token bucket algorithm is a common choice, and it can be implemented as middleware in your server. For distributed systems, use a centralized store like Redis to share rate limit state across all server instances.

Limit tool exposure

You may not want to expose every single API endpoint to the LLM, especially destructive or sensitive ones. Use configuration tags to group tools (e.g., read-only, admin) and allow users to select which groups to load.

Audit requests

Log every tool invocation. Structured logs containing the tool name, arguments, and authenticated user are invaluable for debugging and security audits. Forward these logs to a service like Datadog or an OpenTelemetry collector for analysis and alerting.

Deploy and operate servers

Writing the code is only half the battle. A production-grade service needs a reliable deployment pipeline, monitoring, and a safe process for rolling out updates.

Deployment Target

Best For

Key Consideration

Docker (ECS/K8s)

Complex applications with multiple services

Full control over the environment, but more complex to manage.

Cloudflare Workers

Edge-first, low-latency APIs

Serverless model simplifies scaling, but has execution limits.

Serverless Functions

Simple, single-purpose servers

Pay-per-use model is cost-effective for low traffic.

Package with Docker

Containerizing your server with Docker is the most portable way to deploy it. The Stainless SDK generator can automatically create a Dockerfile and CI/CD workflow to build and push an image to a registry like Docker Hub or GHCR.

Ship to the edge

For the lowest possible latency, deploy your server to the edge using a platform like Cloudflare Workers. The Stainless SDK generator can scaffold a complete, deployable worker project. Once configured, you can deploy globally with a single wrangler deploy command.

Monitor health

Your server should expose a /health endpoint that monitoring systems can check for readiness. Integrate with tools like Prometheus to scrape metrics on request latency, error rates, and tool usage, and use Grafana to visualize this data.

Roll out safely

Use semantic versioning for your server package. When you merge a release pull request, your CI/CD system should automatically publish the new version to PyPI. For critical services, use canary deployments to gradually roll out the new version to a small subset of users.

Apply field patterns

Over time, you'll develop an intuition for what makes a great MCP server. Here are a few patterns we've seen work well in the field.

Optimize performance

Python's asyncio is essential for building high-throughput servers that can handle many concurrent connections. Use asynchronous libraries for all I/O-bound operations, like making HTTP requests or querying a database.

Test end to end

Unit tests are great, but they can't catch everything. Write end-to-end tests that spin up a real instance of your MCP server and use an actual MCP client to make calls. Tools like pytest and its fixture model make it easy to manage the lifecycle of these test resources.

Version tools

As your API evolves, your tool schemas will need to change. To avoid breaking existing clients, make schema changes backward-compatible whenever possible. Use client capability flags to allow newer clients to opt into features that older clients don't support.

Avoid common pitfalls

Here are a few common mistakes to watch out for:

  • Leaking internal errors: Never return raw stack traces or internal error messages to the client.

  • Overloading the context window: Keep tool descriptions concise and schemas focused.

  • Ignoring idempotency: For tools that create or update data, support an idempotency key to prevent duplicate operations from retries.

Ready to build your own production-grade MCP server? Get started for free at https://app.stainless.com/signup.

Frequently asked questions about production MCP servers

What is the difference between MCP and OpenAI function calling?

MCP is an open, portable protocol supported by multiple clients, whereas OpenAI's function calling is a proprietary feature specific to their models. An MCP server can be used by any compatible LLM, offering greater flexibility.

Can I connect an MCP server to OpenAI agents?

Yes, you can connect an MCP server to OpenAI agents by using an adapter library or by configuring your server to emit schemas that are compatible with OpenAI's specific limitations.

How do I handle API rate limits in an MCP server?

Your MCP server should respect the rate limits of your backend API by implementing its own throttling logic, often using a token bucket algorithm with exponential backoff on retries.

Should I deploy one server per API or combine multiple apis?

For simplicity and clear security boundaries, it's usually best to deploy one MCP server per backend API. Combining multiple APIs into a single server can be complex and may inadvertently expose tools.

How do I migrate existing function-calling code to mcp?

Start by generating an MCP server from an OpenAPI spec, which creates a tool for each endpoint. Then, you can incrementally refactor your existing function-calling logic into custom tool handlers within the new MCP server.

Featured MCP Resources

Essential events, guides and insights to help you master MCP server development.

Featured MCP Resources

Essential events, guides and insights to help you master MCP server development.

Featured MCP Resources

Essential events, guides and insights to help you master MCP server development.