How turbopuffer boosted SDK performance while minimizing engineering cost
turbopuffer, a high-performance search engine, migrated from maintaining two hand-written SDKs to five Stainless-generated SDKs. The migration delivered production-ready clients in three new languages — including compiled options like Java and Go for performance-sensitive apps — async support in Python, and real-world performance across the board, with TypeScript and Python clients showing 5–50% gains in benchmarks simulating real workloads.
The breaking point: complex API patterns at scale
These challenges in particular made the process unsustainable:
Cross-language drift: Every feature or schema change had to be re-implemented across all SDKs, and even minor differences in types of response handling could cause bugs.
Limited language expertise: Building idiomatic, production-grade SDKs in five languages required skills beyond turbopuffer’s core product focus.
Performance optimization overhead: Each runtime needed custom tuning, but limited expertise made it hard to consistently meet latency and throughput targets.
As enterprise adoption grew, so did the cost of SDK maintenance. Supporting just TypeScript and Python required significant effort, and the prospect of adding more languages felt daunting. Each new feature meant duplicating work across multiple codebases, slowing release cycles and pulling engineers away from core product development.
Project management approach
turbopuffer’s migration was a tightly scoped engineering effort focused on performance and accuracy. Stainless engineers worked directly with turbopuffer’s team, resolving regressions, design questions, and edge cases in real-time, often within hours, to keep development unblocked.
Implementation phases
Phase one: feasibility & benchmarking
The first phase focused on validating whether Stainless could meet turbopuffer’s core requirements before a full migration. Two questions guided the work: could the generated SDKs deliver sufficient performance on real workloads, and could they support turbopuffer’s existing client.namespace(ns).operation(...)
style API without breaking compatibility?
Early benchmarks focused on Python, where the team needed to process 100 MB+ payloads through latency-sensitive pipelines. Switching from httpx
to aiohttp
improved throughput under heavy concurrency, and integrating orjson
reduced serialization overhead that had been inflating tail latency. These results established that performance targets were achievable in a generated client.
The other major focus was preserving turbopuffer’s existing API ergonomics. Stainless and turbopuffer engineers collaborated to prototype a shared-client approach that maintained the namespace
pattern, and Stainless added withOptions
support in the Java SDK to extend this design consistently across languages.
Phase two: complex schema support
Next, the focus shifted to filling OpenAPI gaps that blocked code generation. A major one was tuple array support, heavily used in turbopuffer’s filtering and ranking logic. turbopuffer addressed this by plugging in custom code to handle these complex types, using Stainless’ “escape hatch” mechanism to extend generated SDKs as needed. These customizations live directly in the SDK repos and automatically persist across future codegen runs.
This approach let the team accurately model complex data structures that an OpenAPI spec can't express, removing the need for brittle hand-written types.
Language-specific improvements also shipped in this phase. In Go, union types were renamed using subtle_union_naming
for idiomatic clarity, ensuring the SDKs felt natural and robust in each language.
Phase three: multi-language rollout
Once the core SDK model was stable, the focus shifted to production-level performance and cross-language rollout. A key breakthrough came from replacing Node’s default fetch
with an Undici-based backend (a lower-level, high-performance alternative), which resolved event-loop stalls under heavy load. This change improved query_scale
benchmarks from 3,436 ms to 2,778 ms (1.24× faster).
With these optimizations in place, consistency across Python, TypeScript, Go, Java, and Ruby became a matter of automation. New SDKs could now ship in under 24 hours. A fully functional Ruby client was delivered a day after it was requested while maintaining identical behavior and performance guarantees across runtimes, for example.
Phase four: production deployment
The final phase focused on operationalizing the new pipeline. Stainless integrated directly into turbopuffer’s CI with GitHub Actions that automatically rebuild and merge SDKs whenever their OpenAPI spec changes.
This ensures that performance patches, custom logic, and new API features are always reflected in generated clients without any intervention. They now enjoy a fully automated release process: every PR provides a preview of SDK diffs, validates changes against benchmarks, and merges updates into production once approved.
All five SDKs reached v1.0 production readiness with minimal breaking changes. Release cycles that once took weeks were reduced to days, giving turbopuffer predictable and quick client delivery.
Performance results
Performance validation was the final milestone of the migration. turbopuffer benchmarked the generated SDKs using workloads representative of real production traffic. Across all metrics, the new clients met or exceeded their hand-maintained counterparts, often by wide margins.
Python SDK Performance
Sync performance benchmarks
Operation | Baseline | Final SDK (opt3) | Improvement |
---|---|---|---|
| 1.88 s | 1.13 s | 1.66× faster |
| 265 ms | 244 ms | 1.08× faster |
| 2.68 s | 3.32 s | 1.24× slower (expected due to iteration overhead)* |
*Note: Async performance was prioritized for turbopuffer’s ingestion workloads, where throughput and concurrency were more impactful than individual sync operation speed. The slower query_scale
sync result is less representative of real-world usage and considered acceptable given the broader performance gains and improved iterator behavior.
Async performance benchmarks
Operation | Baseline | Final SDK (opt3) | Improvement |
---|---|---|---|
| 4.34 s | 2.57 s | 1.69× faster |
TypeScript SDK Performance
Benchmark scenario | Baseline (fetch) | Undici backend | Improvement |
---|---|---|---|
| 3,436 ms | 2,778 ms | 1.24× faster |
| 2,551 ms | 2,422 ms | 1.05× faster |
Note: Replacing the default fetch
stack with an Undici-based backend eliminated Node.js event-loop stalls observed under load and significantly improved streaming throughput at high concurrency. In more representative production-like benchmarks, rather than synthetic stress tests, the new SDK even outperformed the hand-tuned baseline.
Business impact
The migration fundamentally reshaped how turbopuffer builds and ships its client libraries. Work that once required weeks of coordinated effort across three separate codebases now happens in days. New features propagate across all SDKs almost immediately, eliminating the lag between backend changes and client availability. turbopuffer's engineers are now free to focus on advancing their core product rather than maintaining clients.
This new foundation also improved reliability and increased adoption. All five SDKs reached v1.0 production readiness with minimal breaking changes, 100% documentation coverage, and immediate enterprise deployment. Consistency across languages will continue to reduce integration friction for customers and lower support overhead.