Building an MCP server for your existing API isn't just about exposing endpoints to AI models. The architecture decisions you make—from which endpoints to surface to how you handle authentication and scaling—determine whether your implementation becomes a powerful developer tool or a maintenance burden.
This guide covers the essential patterns for production-ready MCP servers, including endpoint selection strategies, schema optimization for AI consumption, deployment architectures from local CLI tools to remote OAuth-secured services, and scaling considerations for enterprise APIs. We'll focus on practical engineering decisions that help you transform your REST API into an AI-native interface without compromising reliability or security.
The Model Context Protocol, or MCP, is a universal standard that lets large language models (LLMs) discover and use your API as a set of tools. Think of it as a structured conversation layer that sits on top of your existing REST API, translating natural language requests from an AI into concrete API calls. This guide covers the essential architecture patterns for building a robust, production-ready MCP server, including the transformation from API to MCP, from endpoint selection and schema design to deployment and scaling. We will focus on practical, engineering-first principles to help you make your API truly AI-native.
How MCP servers enhance existing REST APIs
An MCP server acts as an interpreter between an LLM and your API. It doesn't replace your API but rather enhances it by exposing its capabilities in a format that AI models can understand and execute.
When a user asks an LLM to perform an action related to your service, the LLM's client queries your MCP server to see a list of available "tools." It then selects the right tool, fills in the parameters, and asks for permission to run it. Your MCP server receives this request, translates it into a standard call to your existing API endpoint, and returns the result.
This flow makes your API accessible to a new class of users and applications without requiring you to change your core API logic. For example, Modern Treasury uses an MCP server so their customers can perform one-off banking operations using plain English, dramatically lowering the barrier to entry for complex financial tasks. This entire MCP layer can be automatically generated from the same OpenAPI specification you already use for your SDKs, and you can generate an MCP server from an OpenAPI spec turning a complex integration into a simple configuration step.
MCP server architecture fundamentals
At its core, MCP is built on three primitives that you expose from your server.
Tools: These are the most common primitive and represent actions the LLM can take. Each tool, like
create_user
orlist_invoices
, has a name, a description for the LLM, and a schema defining its input parameters.Resources: These represent static data or context the LLM can access, like a file's contents or a list of database records.
Prompt Templates: These are pre-defined, reusable workflows that can guide an LLM through a multi-step process.
Communication flows from a host application (like Claude Desktop or Cursor) which contains an MCP client. This client connects to your MCP server over a transport layer, which can be a simple standard I/O stream for local development or a stateless HTTP/S or Server-Sent Events (SSE) stream for remote deployments. When creating an MCP server from an OpenAPI spec, it generates a clear structure, typically with a server.ts
for the core logic and an endpoints.ts
where your API endpoints are mapped to these tools.
Which API endpoints you expose
A common mistake is to expose every single API endpoint as a tool. While simple, this can overwhelm an LLM's context window—its short-term memory—with too many choices, leading to poor performance or incorrect tool selection. A more strategic approach is often better.
There are three main patterns for exposing endpoints.
Full mapping: Expose every endpoint as a tool. This is a great starting point for simple APIs or for quickly validating which tools an LLM uses effectively.
Curated subset: Hand-pick a specific set of endpoints to expose. This is ideal for creating focused experiences, like a "read-only" toolset or an "admin-only" group, by disabling resources by default and opting specific ones in.
Composite tools: Create higher-level tools that perform a sequence of API calls. This is useful for common workflows that require multiple steps.
For large APIs with hundreds of endpoints, you can use dynamic tools. Instead of exposing every endpoint, you provide just three meta-tools: list_api_endpoints
, get_api_endpoint_schema
, and invoke_api_endpoint
. This allows the LLM to discover, learn about, and call endpoints on demand, neatly sidestepping context window limitations.
Schema design patterns for AI consumption
LLMs perform best with simple, clear schemas. While your internal API may handle complex, nested objects, your MCP tool schemas should be flattened and simplified for AI consumption. Every bit of ambiguity you remove increases the chance the LLM will invoke your tool correctly on the first try.
Here are some practical simplifications.
Flatten objects: Convert nested JSON objects into a single, flat list of parameters.
Reduce required parameters: Use sensible defaults where possible to reduce the number of fields the LLM must provide.
Tighten descriptions: Write concise, unambiguous descriptions for each parameter. For example, instead of "ID," use "The 24-character unique identifier for the user, starting with
usr_
."
Technical challenges like $ref
pointers, unions (anyOf
), and recursive schemas can also confuse different MCP clients when converting complex OpenAPI specs to MCP servers. A robust MCP generator handles this by automatically transforming schemas based on the target client's capabilities. For instance, it can inline all $ref
pointers for a client that doesn't support them or split a tool with a union input into multiple, distinct tools—one for each variant.
Production deployment architectures
Once your MCP server is designed, you need to deploy it. Your architecture will depend on your users and their security requirements.
Local deployment options
For developers using local AI clients like Cursor or Claude Desktop, the easiest method is a command-line interface (CLI) package. The user installs your MCP server via npm or another package manager and runs it locally. Authentication is typically handled with API keys set as environment variables. This is perfect for developer-focused tools and internal use cases.
Remote deployment options
For non-technical users or web-based AI applications like claude.ai, a local server isn't an option. You need a remote server that supports OAuth2 for secure, user-authorized access. A common and scalable pattern is to deploy the MCP server as a serverless function, for example on a Cloudflare Worker or AWS Lambda behind an API Gateway. This approach allows you to manage authentication, handle scaling automatically, and integrate with your existing infrastructure. You can get started quickly with a pre-built Cloudflare Worker template that handles the OAuth flow for you. You can also publish your server as a Docker image for deployment in any containerized environment.
Hybrid deployment options
You don't have to choose just one deployment model. Many API providers offer both a local CLI for developers and a remote, OAuth-secured server for broader application integration. This hybrid approach provides the best of both worlds, catering to different user needs and security contexts. You can use feature flags or different routing rules to manage which tools are available in each environment.
Scaling MCP servers for enterprise APIs
As your API and user base grow, your MCP implementation will need to mature. Enterprises have specific needs around governance, security, and observability that must be addressed.
Concern | Best Practice |
---|---|
Governance | Maintain the OpenAPI spec as the single source of truth. Automate regeneration of the MCP server whenever the spec changes to keep it in sync. |
Security | Use granular OAuth scopes to limit which tools a user can access. Implement per-tool allow-lists and treat the MCP server as a zero-trust client. |
Rate Limiting | Apply separate rate limits and quotas for AI-driven traffic, as it can be spikier than traditional API usage. Monitor usage per-tool and per-user. |
Observability | Log every tool invocation, including the parameters and the result. Track error rates, latency, and schema changes to quickly diagnose issues. |
Versioning | Tie your MCP server version directly to your API version. Use automated release pull requests to review changes before they go live. |
Frequently asked questions about MCP server architecture
How do I manage API versioning with MCP servers?
Your MCP server version should be tightly coupled with your API version. The best practice is to automatically regenerate the server from your OpenAPI spec in CI/CD, which ensures it always reflects the latest contract.
Can I expose different tool sets to different users?
Yes, you can implement logic in your remote server's authentication flow to filter the list of tools based on the user's permissions or OAuth scopes. You can also build and deploy different server configurations using tags.
What monitoring should I add before going to production?
At a minimum, you should monitor tool invocation counts, error rates, and latency for each tool. Setting up alerts for significant spikes in errors or schema changes that could indicate a breaking change is also recommended.
How do rate limits translate to MCP traffic?
You should apply the same underlying rate limits your API already has, but consider adding a separate, higher-level quota for MCP tool usage, as a single user prompt could trigger multiple tool calls.
Should I run one monolithic server or multiple domain-specific servers?
For most APIs, a single server that can be filtered by the client is sufficient and easier to manage. However, if you have completely distinct product domains with no overlapping functionality, deploying multiple, smaller servers can reduce context size and simplify team ownership.
Ready to build your own MCP server? Get started for free.
Building an MCP server for your existing API isn't just about exposing endpoints to AI models. The architecture decisions you make—from which endpoints to surface to how you handle authentication and scaling—determine whether your implementation becomes a powerful developer tool or a maintenance burden.
This guide covers the essential patterns for production-ready MCP servers, including endpoint selection strategies, schema optimization for AI consumption, deployment architectures from local CLI tools to remote OAuth-secured services, and scaling considerations for enterprise APIs. We'll focus on practical engineering decisions that help you transform your REST API into an AI-native interface without compromising reliability or security.
The Model Context Protocol, or MCP, is a universal standard that lets large language models (LLMs) discover and use your API as a set of tools. Think of it as a structured conversation layer that sits on top of your existing REST API, translating natural language requests from an AI into concrete API calls. This guide covers the essential architecture patterns for building a robust, production-ready MCP server, including the transformation from API to MCP, from endpoint selection and schema design to deployment and scaling. We will focus on practical, engineering-first principles to help you make your API truly AI-native.
How MCP servers enhance existing REST APIs
An MCP server acts as an interpreter between an LLM and your API. It doesn't replace your API but rather enhances it by exposing its capabilities in a format that AI models can understand and execute.
When a user asks an LLM to perform an action related to your service, the LLM's client queries your MCP server to see a list of available "tools." It then selects the right tool, fills in the parameters, and asks for permission to run it. Your MCP server receives this request, translates it into a standard call to your existing API endpoint, and returns the result.
This flow makes your API accessible to a new class of users and applications without requiring you to change your core API logic. For example, Modern Treasury uses an MCP server so their customers can perform one-off banking operations using plain English, dramatically lowering the barrier to entry for complex financial tasks. This entire MCP layer can be automatically generated from the same OpenAPI specification you already use for your SDKs, and you can generate an MCP server from an OpenAPI spec turning a complex integration into a simple configuration step.
MCP server architecture fundamentals
At its core, MCP is built on three primitives that you expose from your server.
Tools: These are the most common primitive and represent actions the LLM can take. Each tool, like
create_user
orlist_invoices
, has a name, a description for the LLM, and a schema defining its input parameters.Resources: These represent static data or context the LLM can access, like a file's contents or a list of database records.
Prompt Templates: These are pre-defined, reusable workflows that can guide an LLM through a multi-step process.
Communication flows from a host application (like Claude Desktop or Cursor) which contains an MCP client. This client connects to your MCP server over a transport layer, which can be a simple standard I/O stream for local development or a stateless HTTP/S or Server-Sent Events (SSE) stream for remote deployments. When creating an MCP server from an OpenAPI spec, it generates a clear structure, typically with a server.ts
for the core logic and an endpoints.ts
where your API endpoints are mapped to these tools.
Which API endpoints you expose
A common mistake is to expose every single API endpoint as a tool. While simple, this can overwhelm an LLM's context window—its short-term memory—with too many choices, leading to poor performance or incorrect tool selection. A more strategic approach is often better.
There are three main patterns for exposing endpoints.
Full mapping: Expose every endpoint as a tool. This is a great starting point for simple APIs or for quickly validating which tools an LLM uses effectively.
Curated subset: Hand-pick a specific set of endpoints to expose. This is ideal for creating focused experiences, like a "read-only" toolset or an "admin-only" group, by disabling resources by default and opting specific ones in.
Composite tools: Create higher-level tools that perform a sequence of API calls. This is useful for common workflows that require multiple steps.
For large APIs with hundreds of endpoints, you can use dynamic tools. Instead of exposing every endpoint, you provide just three meta-tools: list_api_endpoints
, get_api_endpoint_schema
, and invoke_api_endpoint
. This allows the LLM to discover, learn about, and call endpoints on demand, neatly sidestepping context window limitations.
Schema design patterns for AI consumption
LLMs perform best with simple, clear schemas. While your internal API may handle complex, nested objects, your MCP tool schemas should be flattened and simplified for AI consumption. Every bit of ambiguity you remove increases the chance the LLM will invoke your tool correctly on the first try.
Here are some practical simplifications.
Flatten objects: Convert nested JSON objects into a single, flat list of parameters.
Reduce required parameters: Use sensible defaults where possible to reduce the number of fields the LLM must provide.
Tighten descriptions: Write concise, unambiguous descriptions for each parameter. For example, instead of "ID," use "The 24-character unique identifier for the user, starting with
usr_
."
Technical challenges like $ref
pointers, unions (anyOf
), and recursive schemas can also confuse different MCP clients when converting complex OpenAPI specs to MCP servers. A robust MCP generator handles this by automatically transforming schemas based on the target client's capabilities. For instance, it can inline all $ref
pointers for a client that doesn't support them or split a tool with a union input into multiple, distinct tools—one for each variant.
Production deployment architectures
Once your MCP server is designed, you need to deploy it. Your architecture will depend on your users and their security requirements.
Local deployment options
For developers using local AI clients like Cursor or Claude Desktop, the easiest method is a command-line interface (CLI) package. The user installs your MCP server via npm or another package manager and runs it locally. Authentication is typically handled with API keys set as environment variables. This is perfect for developer-focused tools and internal use cases.
Remote deployment options
For non-technical users or web-based AI applications like claude.ai, a local server isn't an option. You need a remote server that supports OAuth2 for secure, user-authorized access. A common and scalable pattern is to deploy the MCP server as a serverless function, for example on a Cloudflare Worker or AWS Lambda behind an API Gateway. This approach allows you to manage authentication, handle scaling automatically, and integrate with your existing infrastructure. You can get started quickly with a pre-built Cloudflare Worker template that handles the OAuth flow for you. You can also publish your server as a Docker image for deployment in any containerized environment.
Hybrid deployment options
You don't have to choose just one deployment model. Many API providers offer both a local CLI for developers and a remote, OAuth-secured server for broader application integration. This hybrid approach provides the best of both worlds, catering to different user needs and security contexts. You can use feature flags or different routing rules to manage which tools are available in each environment.
Scaling MCP servers for enterprise APIs
As your API and user base grow, your MCP implementation will need to mature. Enterprises have specific needs around governance, security, and observability that must be addressed.
Concern | Best Practice |
---|---|
Governance | Maintain the OpenAPI spec as the single source of truth. Automate regeneration of the MCP server whenever the spec changes to keep it in sync. |
Security | Use granular OAuth scopes to limit which tools a user can access. Implement per-tool allow-lists and treat the MCP server as a zero-trust client. |
Rate Limiting | Apply separate rate limits and quotas for AI-driven traffic, as it can be spikier than traditional API usage. Monitor usage per-tool and per-user. |
Observability | Log every tool invocation, including the parameters and the result. Track error rates, latency, and schema changes to quickly diagnose issues. |
Versioning | Tie your MCP server version directly to your API version. Use automated release pull requests to review changes before they go live. |
Frequently asked questions about MCP server architecture
How do I manage API versioning with MCP servers?
Your MCP server version should be tightly coupled with your API version. The best practice is to automatically regenerate the server from your OpenAPI spec in CI/CD, which ensures it always reflects the latest contract.
Can I expose different tool sets to different users?
Yes, you can implement logic in your remote server's authentication flow to filter the list of tools based on the user's permissions or OAuth scopes. You can also build and deploy different server configurations using tags.
What monitoring should I add before going to production?
At a minimum, you should monitor tool invocation counts, error rates, and latency for each tool. Setting up alerts for significant spikes in errors or schema changes that could indicate a breaking change is also recommended.
How do rate limits translate to MCP traffic?
You should apply the same underlying rate limits your API already has, but consider adding a separate, higher-level quota for MCP tool usage, as a single user prompt could trigger multiple tool calls.
Should I run one monolithic server or multiple domain-specific servers?
For most APIs, a single server that can be filtered by the client is sufficient and easier to manage. However, if you have completely distinct product domains with no overlapping functionality, deploying multiple, smaller servers can reduce context size and simplify team ownership.
Ready to build your own MCP server? Get started for free.
Building an MCP server for your existing API isn't just about exposing endpoints to AI models. The architecture decisions you make—from which endpoints to surface to how you handle authentication and scaling—determine whether your implementation becomes a powerful developer tool or a maintenance burden.
This guide covers the essential patterns for production-ready MCP servers, including endpoint selection strategies, schema optimization for AI consumption, deployment architectures from local CLI tools to remote OAuth-secured services, and scaling considerations for enterprise APIs. We'll focus on practical engineering decisions that help you transform your REST API into an AI-native interface without compromising reliability or security.
The Model Context Protocol, or MCP, is a universal standard that lets large language models (LLMs) discover and use your API as a set of tools. Think of it as a structured conversation layer that sits on top of your existing REST API, translating natural language requests from an AI into concrete API calls. This guide covers the essential architecture patterns for building a robust, production-ready MCP server, including the transformation from API to MCP, from endpoint selection and schema design to deployment and scaling. We will focus on practical, engineering-first principles to help you make your API truly AI-native.
How MCP servers enhance existing REST APIs
An MCP server acts as an interpreter between an LLM and your API. It doesn't replace your API but rather enhances it by exposing its capabilities in a format that AI models can understand and execute.
When a user asks an LLM to perform an action related to your service, the LLM's client queries your MCP server to see a list of available "tools." It then selects the right tool, fills in the parameters, and asks for permission to run it. Your MCP server receives this request, translates it into a standard call to your existing API endpoint, and returns the result.
This flow makes your API accessible to a new class of users and applications without requiring you to change your core API logic. For example, Modern Treasury uses an MCP server so their customers can perform one-off banking operations using plain English, dramatically lowering the barrier to entry for complex financial tasks. This entire MCP layer can be automatically generated from the same OpenAPI specification you already use for your SDKs, and you can generate an MCP server from an OpenAPI spec turning a complex integration into a simple configuration step.
MCP server architecture fundamentals
At its core, MCP is built on three primitives that you expose from your server.
Tools: These are the most common primitive and represent actions the LLM can take. Each tool, like
create_user
orlist_invoices
, has a name, a description for the LLM, and a schema defining its input parameters.Resources: These represent static data or context the LLM can access, like a file's contents or a list of database records.
Prompt Templates: These are pre-defined, reusable workflows that can guide an LLM through a multi-step process.
Communication flows from a host application (like Claude Desktop or Cursor) which contains an MCP client. This client connects to your MCP server over a transport layer, which can be a simple standard I/O stream for local development or a stateless HTTP/S or Server-Sent Events (SSE) stream for remote deployments. When creating an MCP server from an OpenAPI spec, it generates a clear structure, typically with a server.ts
for the core logic and an endpoints.ts
where your API endpoints are mapped to these tools.
Which API endpoints you expose
A common mistake is to expose every single API endpoint as a tool. While simple, this can overwhelm an LLM's context window—its short-term memory—with too many choices, leading to poor performance or incorrect tool selection. A more strategic approach is often better.
There are three main patterns for exposing endpoints.
Full mapping: Expose every endpoint as a tool. This is a great starting point for simple APIs or for quickly validating which tools an LLM uses effectively.
Curated subset: Hand-pick a specific set of endpoints to expose. This is ideal for creating focused experiences, like a "read-only" toolset or an "admin-only" group, by disabling resources by default and opting specific ones in.
Composite tools: Create higher-level tools that perform a sequence of API calls. This is useful for common workflows that require multiple steps.
For large APIs with hundreds of endpoints, you can use dynamic tools. Instead of exposing every endpoint, you provide just three meta-tools: list_api_endpoints
, get_api_endpoint_schema
, and invoke_api_endpoint
. This allows the LLM to discover, learn about, and call endpoints on demand, neatly sidestepping context window limitations.
Schema design patterns for AI consumption
LLMs perform best with simple, clear schemas. While your internal API may handle complex, nested objects, your MCP tool schemas should be flattened and simplified for AI consumption. Every bit of ambiguity you remove increases the chance the LLM will invoke your tool correctly on the first try.
Here are some practical simplifications.
Flatten objects: Convert nested JSON objects into a single, flat list of parameters.
Reduce required parameters: Use sensible defaults where possible to reduce the number of fields the LLM must provide.
Tighten descriptions: Write concise, unambiguous descriptions for each parameter. For example, instead of "ID," use "The 24-character unique identifier for the user, starting with
usr_
."
Technical challenges like $ref
pointers, unions (anyOf
), and recursive schemas can also confuse different MCP clients when converting complex OpenAPI specs to MCP servers. A robust MCP generator handles this by automatically transforming schemas based on the target client's capabilities. For instance, it can inline all $ref
pointers for a client that doesn't support them or split a tool with a union input into multiple, distinct tools—one for each variant.
Production deployment architectures
Once your MCP server is designed, you need to deploy it. Your architecture will depend on your users and their security requirements.
Local deployment options
For developers using local AI clients like Cursor or Claude Desktop, the easiest method is a command-line interface (CLI) package. The user installs your MCP server via npm or another package manager and runs it locally. Authentication is typically handled with API keys set as environment variables. This is perfect for developer-focused tools and internal use cases.
Remote deployment options
For non-technical users or web-based AI applications like claude.ai, a local server isn't an option. You need a remote server that supports OAuth2 for secure, user-authorized access. A common and scalable pattern is to deploy the MCP server as a serverless function, for example on a Cloudflare Worker or AWS Lambda behind an API Gateway. This approach allows you to manage authentication, handle scaling automatically, and integrate with your existing infrastructure. You can get started quickly with a pre-built Cloudflare Worker template that handles the OAuth flow for you. You can also publish your server as a Docker image for deployment in any containerized environment.
Hybrid deployment options
You don't have to choose just one deployment model. Many API providers offer both a local CLI for developers and a remote, OAuth-secured server for broader application integration. This hybrid approach provides the best of both worlds, catering to different user needs and security contexts. You can use feature flags or different routing rules to manage which tools are available in each environment.
Scaling MCP servers for enterprise APIs
As your API and user base grow, your MCP implementation will need to mature. Enterprises have specific needs around governance, security, and observability that must be addressed.
Concern | Best Practice |
---|---|
Governance | Maintain the OpenAPI spec as the single source of truth. Automate regeneration of the MCP server whenever the spec changes to keep it in sync. |
Security | Use granular OAuth scopes to limit which tools a user can access. Implement per-tool allow-lists and treat the MCP server as a zero-trust client. |
Rate Limiting | Apply separate rate limits and quotas for AI-driven traffic, as it can be spikier than traditional API usage. Monitor usage per-tool and per-user. |
Observability | Log every tool invocation, including the parameters and the result. Track error rates, latency, and schema changes to quickly diagnose issues. |
Versioning | Tie your MCP server version directly to your API version. Use automated release pull requests to review changes before they go live. |
Frequently asked questions about MCP server architecture
How do I manage API versioning with MCP servers?
Your MCP server version should be tightly coupled with your API version. The best practice is to automatically regenerate the server from your OpenAPI spec in CI/CD, which ensures it always reflects the latest contract.
Can I expose different tool sets to different users?
Yes, you can implement logic in your remote server's authentication flow to filter the list of tools based on the user's permissions or OAuth scopes. You can also build and deploy different server configurations using tags.
What monitoring should I add before going to production?
At a minimum, you should monitor tool invocation counts, error rates, and latency for each tool. Setting up alerts for significant spikes in errors or schema changes that could indicate a breaking change is also recommended.
How do rate limits translate to MCP traffic?
You should apply the same underlying rate limits your API already has, but consider adding a separate, higher-level quota for MCP tool usage, as a single user prompt could trigger multiple tool calls.
Should I run one monolithic server or multiple domain-specific servers?
For most APIs, a single server that can be filtered by the client is sufficient and easier to manage. However, if you have completely distinct product domains with no overlapping functionality, deploying multiple, smaller servers can reduce context size and simplify team ownership.
Ready to build your own MCP server? Get started for free.