Skip to main content

What is Prompt Caching

Many model providers (including Anthropic Claude and OpenAI) natively support prompt caching. When the same context is sent repeatedly, the provider reuses previously computed results, reducing latency and token costs.

Caching Limitations

Caches cannot be shared across accounts — this is a provider-side restriction, not a platform limitation. As a multi-channel relay platform, we cannot guarantee that two consecutive requests from the same account will always be routed to the same backend channel. If requests are sent to different channels, existing caches cannot be reused, which may lower cache hit rates and increase costs.

Maintaining Routing Consistency

To address this, you can include a previous request’s ID in subsequent requests. The system will attempt to route the new request to the same channel, maximizing cache reuse. Two methods are supported:
HeaderSourceTTL
X-Request-IdThe X-Rixapi-Request-Id header from the previous response5 minutes
X-Response-IdThe conversation ID in the previous response JSON (e.g., Claude’s message.id)5 minutes

Example

import anthropic

client = anthropic.Anthropic(
    base_url="https://api.ephone.ai/anthropic",
    api_key="API_KEY",
)

# First request
response = client.messages.create(
    model="claude-opus-4-5-20251101",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}]
)

response_id = response.id

# Second request — include response ID to stay on the same channel
response2 = client.messages.create(
    model="claude-opus-4-5-20251101",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello!"},
        {"role": "assistant", "content": response.content[0].text},
        {"role": "user", "content": "Continue our conversation..."},
    ],
    extra_headers={"X-Response-Id": response_id}
)

Sub-Task Extensions

Some models (such as video generation and image editing) support extending an existing task — for example, appending content to an already-generated video, or performing a secondary edit on a generated image. These operations typically require the original task_id in the new request. See the relevant model guides (Video, Image, etc.) for specific parameters.