Sampling

Sampling allows servers to request LLM completions through the client, enabling sophisticated agentic behaviors while maintaining security boundaries.

Jump to section

Request Flow

Server sends sampling/createMessage request to client. The server initiates by sending a structured request containing the messages and parameters for the LLM interaction.
Client reviews and potentially modifies the request. The client can alter the system prompt, add safety instructions, or inject additional context before proceeding.
Client samples from an LLM. The client forwards the processed request to its connected LLM (like Claude) using its established connection and authentication.
Client reviews the completion. Before returning results, the client can filter sensitive information, validate the response format, or apply post-processing rules.
Client returns the result to the server. The server receives the final LLM response and continues its operation based on the insights gained.

Request Parameters

type Request = {
  messages: Array<{
    role: "user" | "assistant",
    content: {
      type: "text" | "image",
      text?: string,
      data?: string,
      mimeType?: string
    }
  }>,
  modelPreferences?: {
    hints?: [{ name?: string }],
    costPriority?: number,
    speedPriority?: number,
    intelligencePriority?: number
  },
  systemPrompt?: string,
  includeContext?: "none" | "thisServer" | "allServers",
  temperature?: number,
  maxTokens: number,
  stopSequences?: string[],
  metadata?: Record<string, unknown>
}

Security Design

The host maintains complete control over sampling. No server can force an LLM interaction—the host application always has the final say on whether sampling proceeds.
Full chat history is never exposed to servers. Servers only see the specific response to their request, preserving user privacy and preventing context manipulation.
All sampling operations require human approval. Users must explicitly consent to each sampling request, with clear visibility into what the server wants to ask.
Clients can modify or reject any sampling request. Clients act as a security gateway, filtering inappropriate requests and ensuring compliance with user preferences.

Use cases

Analyzing complex resources that need LLM interpretation. A server might sample to summarize lengthy documents, extract key insights from logs, or interpret error messages in context.
Multi-step reasoning tasks. Servers can use sampling to break down complex problems, like debugging issues that require understanding multiple code files and their interactions.
Generating structured data from natural language. Convert user descriptions into JSON schemas, SQL queries, or configuration files with proper validation and error handling.
Handling errors with LLM assistance. When operations fail, servers can use sampling to get intelligent error analysis and suggested recovery strategies.

Request Flow

Server sends sampling/createMessage request to client. The server initiates by sending a structured request containing the messages and parameters for the LLM interaction.
Client reviews and potentially modifies the request. The client can alter the system prompt, add safety instructions, or inject additional context before proceeding.
Client samples from an LLM. The client forwards the processed request to its connected LLM (like Claude) using its established connection and authentication.
Client reviews the completion. Before returning results, the client can filter sensitive information, validate the response format, or apply post-processing rules.
Client returns the result to the server. The server receives the final LLM response and continues its operation based on the insights gained.

Request Parameters

type Request = {
  messages: Array<{
    role: "user" | "assistant",
    content: {
      type: "text" | "image",
      text?: string,
      data?: string,
      mimeType?: string
    }
  }>,
  modelPreferences?: {
    hints?: [{ name?: string }],
    costPriority?: number,
    speedPriority?: number,
    intelligencePriority?: number
  },
  systemPrompt?: string,
  includeContext?: "none" | "thisServer" | "allServers",
  temperature?: number,
  maxTokens: number,
  stopSequences?: string[],
  metadata?: Record<string, unknown>
}

Security Design

The host maintains complete control over sampling. No server can force an LLM interaction—the host application always has the final say on whether sampling proceeds.
Full chat history is never exposed to servers. Servers only see the specific response to their request, preserving user privacy and preventing context manipulation.
All sampling operations require human approval. Users must explicitly consent to each sampling request, with clear visibility into what the server wants to ask.
Clients can modify or reject any sampling request. Clients act as a security gateway, filtering inappropriate requests and ensuring compliance with user preferences.

Use cases

Analyzing complex resources that need LLM interpretation. A server might sample to summarize lengthy documents, extract key insights from logs, or interpret error messages in context.
Multi-step reasoning tasks. Servers can use sampling to break down complex problems, like debugging issues that require understanding multiple code files and their interactions.
Generating structured data from natural language. Convert user descriptions into JSON schemas, SQL queries, or configuration files with proper validation and error handling.
Handling errors with LLM assistance. When operations fail, servers can use sampling to get intelligent error analysis and suggested recovery strategies.

Request Flow

Server sends sampling/createMessage request to client. The server initiates by sending a structured request containing the messages and parameters for the LLM interaction.
Client reviews and potentially modifies the request. The client can alter the system prompt, add safety instructions, or inject additional context before proceeding.
Client samples from an LLM. The client forwards the processed request to its connected LLM (like Claude) using its established connection and authentication.
Client reviews the completion. Before returning results, the client can filter sensitive information, validate the response format, or apply post-processing rules.
Client returns the result to the server. The server receives the final LLM response and continues its operation based on the insights gained.

Request Parameters

type Request = {
  messages: Array<{
    role: "user" | "assistant",
    content: {
      type: "text" | "image",
      text?: string,
      data?: string,
      mimeType?: string
    }
  }>,
  modelPreferences?: {
    hints?: [{ name?: string }],
    costPriority?: number,
    speedPriority?: number,
    intelligencePriority?: number
  },
  systemPrompt?: string,
  includeContext?: "none" | "thisServer" | "allServers",
  temperature?: number,
  maxTokens: number,
  stopSequences?: string[],
  metadata?: Record<string, unknown>
}

Security Design

The host maintains complete control over sampling. No server can force an LLM interaction—the host application always has the final say on whether sampling proceeds.
Full chat history is never exposed to servers. Servers only see the specific response to their request, preserving user privacy and preventing context manipulation.
All sampling operations require human approval. Users must explicitly consent to each sampling request, with clear visibility into what the server wants to ask.
Clients can modify or reject any sampling request. Clients act as a security gateway, filtering inappropriate requests and ensuring compliance with user preferences.

Use cases

Analyzing complex resources that need LLM interpretation. A server might sample to summarize lengthy documents, extract key insights from logs, or interpret error messages in context.
Multi-step reasoning tasks. Servers can use sampling to break down complex problems, like debugging issues that require understanding multiple code files and their interactions.
Generating structured data from natural language. Convert user descriptions into JSON schemas, SQL queries, or configuration files with proper validation and error handling.
Handling errors with LLM assistance. When operations fail, servers can use sampling to get intelligent error analysis and suggested recovery strategies.

Prompts

Transports

Featured MCP Resources

Essential events, guides and insights to help you master MCP server development.

Featured MCP Resources

Essential events, guides and insights to help you master MCP server development.

Featured MCP Resources

Essential events, guides and insights to help you master MCP server development.

Sampling

Request Flow

Request Parameters

Security Design

Use cases

Request Flow

Request Parameters

Security Design

Use cases

Request Flow

Request Parameters

Security Design

Use cases

Featured MCP Resources

Step by step guide: generate an MCP server

MCP servers with Stainless and Cloudflare

What we learned converting complex OpenAPI specs to MCP servers

From API to MCP: a practical guide for developers

MCP and the future of AI x API - ft. OpenAI, Anthropic, Stainless

Getting started with Stainless

Introducing Scorecard's MCP server

Introducing the Modern Treasury MCP server

Featured MCP Resources

Step by step guide: generate an MCP server

MCP servers with Stainless and Cloudflare

What we learned converting complex OpenAPI specs to MCP servers

From API to MCP: a practical guide for developers

MCP and the future of AI x API - ft. OpenAI, Anthropic, Stainless

Getting started with Stainless

Introducing Scorecard's MCP server

Introducing the Modern Treasury MCP server

Featured MCP Resources

Step by step guide: generate an MCP server

MCP servers with Stainless and Cloudflare

What we learned converting complex OpenAPI specs to MCP servers

From API to MCP: a practical guide for developers

MCP and the future of AI x API - ft. OpenAI, Anthropic, Stainless

Getting started with Stainless

Introducing Scorecard's MCP server

Introducing the Modern Treasury MCP server