What is Prompt Caching
Many model providers (including Anthropic Claude and OpenAI) natively support prompt caching. When the same context is sent repeatedly, the provider reuses previously computed results, reducing latency and token costs.Caching Limitations
Caches cannot be shared across accounts — this is a provider-side restriction, not a platform limitation. As a multi-channel relay platform, we cannot guarantee that two consecutive requests from the same account will always be routed to the same backend channel. If requests are sent to different channels, existing caches cannot be reused, which may lower cache hit rates and increase costs.Maintaining Routing Consistency
To address this, you can include a previous request’s ID in subsequent requests. The system will attempt to route the new request to the same channel, maximizing cache reuse. Two methods are supported:| Header | Source | TTL |
|---|---|---|
X-Request-Id | The X-Rixapi-Request-Id header from the previous response | 5 minutes |
X-Response-Id | The conversation ID in the previous response JSON (e.g., Claude’s message.id) | 5 minutes |
Example
Sub-Task Extensions
Some models (such as video generation and image editing) support extending an existing task — for example, appending content to an already-generated video, or performing a secondary edit on a generated image. These operations typically require the originaltask_id in the new request. See the relevant model guides (Video, Image, etc.) for specific parameters.