Prompt Caching

What is Prompt Caching

Many model providers (including Anthropic Claude and OpenAI) natively support prompt caching. When the same context is sent repeatedly, the provider reuses previously computed results, reducing latency and token costs.

Caching Limitations

Caches cannot be shared across accounts — this is a provider-side restriction, not a platform limitation. As a multi-channel relay platform, we cannot guarantee that two consecutive requests from the same account will always be routed to the same backend channel. If requests are sent to different channels, existing caches cannot be reused, which may lower cache hit rates and increase costs.

Maintaining Routing Consistency

To address this, you can include a previous request’s ID in subsequent requests. The system will attempt to route the new request to the same channel, maximizing cache reuse. Two methods are supported:

Header	Source	TTL
`X-Request-Id`	The `X-Rixapi-Request-Id` header from the previous response	5 minutes
`X-Response-Id`	The conversation ID in the previous response JSON (e.g., Claude’s `message.id`)	5 minutes

Example

import anthropic

client = anthropic.Anthropic(
    base_url="https://api.ephone.ai/anthropic",
    api_key="API_KEY",
)

# First request
response = client.messages.create(
    model="claude-opus-4-5-20251101",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}]
)

response_id = response.id

# Second request — include response ID to stay on the same channel
response2 = client.messages.create(
    model="claude-opus-4-5-20251101",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello!"},
        {"role": "assistant", "content": response.content[0].text},
        {"role": "user", "content": "Continue our conversation..."},
    ],
    extra_headers={"X-Response-Id": response_id}
)

Sub-Task Extensions

Some models (such as video generation and image editing) support extending an existing task — for example, appending content to an already-generated video, or performing a secondary edit on a generated image. These operations typically require the original task_id in the new request. See the relevant model guides (Video, Image, etc.) for specific parameters.

Getting Started

Models & Services

Usage Guide

What is Prompt Caching

Caching Limitations

Maintaining Routing Consistency

Example

Sub-Task Extensions

Getting Started

Models & Services

Usage Guide

Documentation Index

​What is Prompt Caching

​Caching Limitations

​Maintaining Routing Consistency

​Example

​Sub-Task Extensions

What is Prompt Caching

Caching Limitations

Maintaining Routing Consistency

Example

Sub-Task Extensions