Skip to content
FeedbackDashboard

Configure SSE streaming responses

Enable Server-Sent Events (SSE) streaming in your SDKs to deliver real-time data to clients

Server-Sent Events (SSE) streaming enables your API to push real-time updates to clients over a single HTTP connection. Unlike WebSockets, SSE is unidirectional (server-to-client only) and uses standard HTTP, making it simpler to implement and naturally compatible with proxies, firewalls, and existing HTTP infrastructure.

Stainless generates SDK bindings that let your users consume streams ergonomically through for await loops in TypeScript and similar patterns in other languages.

To enable streaming on a method, add a streaming object to the method definition in your Stainless config:

resources:
my_resources:
models:
stream_chunk: StreamResponse
methods:
generate:
endpoint: post /v1/generate
type: http
streaming:
param_discriminator: stream
stream_event_model: my_resources.stream_chunk

The streaming object accepts these properties:

PropertyDescription
param_discriminatorThe request parameter that tells the server whether to respond with SSE or JSON. Set to null if the endpoint always streams.
stream_event_modelThe full path to the model representing each Server-Sent Event, for example, chat.completions.chat_completion_chunk. If not specified, defaults to the response schema.
params_type_nameThe base name for generated request parameter types. When set, Stainless generates {params_type_name}NonStreaming and {params_type_name}Streaming type variants.

The param_discriminator works regardless of where the parameter is defined in your OpenAPI spec—whether in the request body, query string, or elsewhere. Stainless finds the parameter by name in the method and uses it to determine the response type.

For dual-mode endpoints, Stainless generates two request parameter types: one for streaming and one for non-streaming requests. By default, Stainless infers the type name from the request body’s $ref or model name. If neither is available, you may see the diagnostic:

Streaming/CannotInferParamsName: No model name, $ref or streaming.param_type_name defined - using default params type name

To resolve this, set params_type_name to a descriptive name for your request parameters:

resources:
chat:
subresources:
completions:
models:
chat_completion: CreateChatCompletionResponse
chat_completion_chunk: CreateChatCompletionStreamResponse
methods:
create:
endpoint: post /chat/completions
type: http
streaming:
stream_event_model: chat.completions.chat_completion_chunk
param_discriminator: stream
params_type_name: chat_completion_create_params

This generates clearly named types:

  • ChatCompletionCreateParamsNonStreaming
  • ChatCompletionCreateParamsStreaming

Without params_type_name, Stainless uses a default name which may be less descriptive. Setting this property is optional but recommended for clearer SDK type names.

If your endpoint always returns a streamed response, set param_discriminator to null:

resources:
my_resources:
models:
stream_chunk: StreamResponse
methods:
stream_data:
endpoint: post /v1/stream
type: http
streaming:
param_discriminator: null
stream_event_model: my_resources.stream_chunk

Many APIs support both streaming and non-streaming responses on the same endpoint, controlled by a request parameter. For example, setting stream: true might return an SSE stream while stream: false returns a complete response.

In your OpenAPI spec, define the stream parameter as a boolean:

# OpenAPI spec
paths:
/chat/completions:
post:
operationId: createChatCompletion
requestBody:
content:
application/json:
schema:
type: object
properties:
stream:
type: boolean
description: If true, partial message deltas will be sent as SSE events

Then configure streaming in your Stainless config:

resources:
chat:
subresources:
completions:
models:
chat_completion: CreateChatCompletionResponse
chat_completion_chunk: CreateChatCompletionStreamResponse
methods:
create:
endpoint: post /chat/completions
type: http
streaming:
stream_event_model: chat.completions.chat_completion_chunk
param_discriminator: stream

In this example:

  • When users set stream: false (or omit it), the SDK returns a ChatCompletion response
  • When users set stream: true, the SDK returns an iterable stream of ChatCompletionChunk events

The exact API varies by language. In some languages, users pass a stream parameter to the method. In other languages (like Go and Python), there are separate methods (for example, create and createStreaming) because the return types differ and the language cannot express dependent typing based on parameters.

You can define how the SDK handles specific streaming events using the top-level streaming configuration. This controls what happens when the SDK receives different types of events or data messages.

streaming:
on_event:
- data_starts_with: '[DONE]'
handle: done
- event_type: error
handle: error
- kind: fallthrough
handle: yield

The on_event array defines event handlers that are evaluated in order. When the SDK receives a streaming event, it checks each handler sequentially until it finds a match.

Each event handler can match on one of these conditions:

PropertyDescription
kindThe type of handler. Use fallthrough as a catch-all that matches any event, including events with no type and future event types. Place fallthrough handlers at the end of your list for forward compatibility
data_starts_withMatches events where the data field starts with the specified string
event_typeMatches events with the specified SSE event type. Use null to match only events where the event: line is absent

The handle property specifies what the SDK does when an event matches:

ActionDescription
yieldParse the event data as JSON and return it to the user. The data must conform to the stream_event_model schema. At least one handler must use this action, otherwise no data is returned to users
doneSignal that the stream has ended normally. The SDK continues iterating the stream to fully consume it, but ignores all subsequent events. Use this when you need to ensure the HTTP connection is properly drained
errorParse the event as an error and raise an exception. Use with error_property to specify which field contains the error details
breakStop processing the stream immediately without consuming remaining events. Use this when you want to terminate iteration as soon as possible
continueSkip this event and continue to the next one. Use this for keepalive, heartbeat, or other control messages you want to ignore

Here is a complete example based on a chat completion API:

streaming:
on_event:
# Handle server-sent done signal
- data_starts_with: '[DONE]'
handle: done
# Handle explicit error events
- event_type: error
handle: error
# Skip keepalive messages
- event_type: [ping, heartbeat]
handle: continue
# Yield all other events, checking for inline errors
- kind: fallthrough
handle: yield
error_property: error
resources:
chat:
subresources:
completions:
models:
chat_completion: CreateChatCompletionResponse
chat_completion_chunk: CreateChatCompletionStreamResponse
methods:
create:
endpoint: post /chat/completions
type: http
streaming:
stream_event_model: chat.completions.chat_completion_chunk
param_discriminator: stream

This configuration:

  1. Ends the stream when the data starts with [DONE], continuing to consume remaining events
  2. Raises an exception when receiving an error event type
  3. Ignores ping and heartbeat keepalive messages
  4. Yields all other events to the user using kind: fallthrough as a catch-all
  5. Checks the error property in yielded events to detect inline errors

Use break when you want to stop processing the stream without consuming remaining events. This is useful when the client needs to disconnect immediately:

streaming:
on_event:
- event_type: fatal_error
handle: break
- kind: fallthrough
handle: yield

When the SDK receives a fatal_error event, it stops iterating immediately. Unlike done, the SDK does not continue consuming the stream.

To handle errors within streamed events, use the error_property option. This tells the SDK which property in the event data contains the error details:

streaming:
on_event:
- kind: fallthrough
handle: yield
error_property: error

When the SDK encounters an event where the error property is present and truthy, it raises an exception using that property’s value as the error message or object.

For explicit error event types, you can omit error_property to use the entire event data as the error:

streaming:
on_event:
- event_type: error
handle: error # Uses full event data as the error
- kind: fallthrough
handle: yield
error_property: error # Uses only the 'error' field

Once streaming is configured, users can consume streams in each SDK using idiomatic patterns.

const stream = client.chat.completions
.create({
model: 'gpt-4',
messages: [{ role: 'user', content: 'Hello!' }],
stream: true,
})
.on('chunk', (chunk) => {
process.stdout.write(chunk.choices[0]?.delta?.content || '');
});
// Or use async iteration
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || '');
}
  • Define a dedicated model for stream chunks rather than reusing your main response model
  • Use a discriminated union for your stream event model when you have multiple event types. This improves deserialization performance in some languages
  • Use meaningful termination signals like [DONE] that are easy to identify
  • Consider supporting both streaming and non-streaming modes for flexibility
  • Test streaming behavior with slow connections and interruptions