Documentation Index
Fetch the complete documentation index at: https://docs.rixapi.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
Overview
/v1/models/pricing is a public, unauthenticated aggregation endpoint that returns every model currently available on the platform along with its published price. The response strictly follows the OpenRouter Providers specification, so it can be ingested by OpenRouter, third-party price aggregators, or any custom client.
This endpoint is the “published price” entry point and carries no user identity. Prices returned represent the final cost for the
default group in an anonymous context. To see the price an authenticated user actually pays (group, tier discount, agent markup, etc.), use /api/pricing instead.When to use it
| Use case | Description |
|---|---|
| Third-party price aggregation | OpenRouter and other aggregators scraping the model catalog and published prices |
| Client display | Apps, bots, SDKs that need to render the model list |
| Monitoring | Track changes to the model catalog or pricing over time |
Endpoint
| Item | Value |
|---|---|
| URL | https://api.ephone.ai/v1/models/pricing |
| Method | GET |
| Auth | None (public) |
| Rate limit | Strict per-IP critical rate limit |
| Caching | 60s in-process cache; responses include Cache-Control: public, max-age=60 |
Request examples
Response structure
The top level is always wrapped in adata array:
Top-level field
| Field | Type | Description |
|---|---|---|
data | Array<Model> | Array of model entries; empty array means no models are currently published |
Model fields
| Field | Type | Required | Description |
|---|---|---|---|
id | string | ✅ | Unique model identifier; use this value when calling /v1/chat/completions |
name | string | ✅ | Display name (e.g. GPT-4o); falls back to id when not configured |
created | int64 | ✅ | Model release time (Unix seconds) |
input_modalities | string[] | ✅ | Supported input modalities; includes at least text |
output_modalities | string[] | ✅ | Supported output modalities; includes at least text |
quantization | string | ✅ | Quantization (fp16 / fp8 / bf16 / int8 / unknown) |
context_length | int | ✅ | Maximum context window in tokens |
max_output_length | int | ✅ | Maximum output tokens per response |
pricing | Pricing | ✅ | Pricing object; see below |
pricing_tiers | Pricing[] | Tiered pricing array (at most one element; populated when an admin explicitly tags a Condition with a threshold) | |
supported_sampling_parameters | string[] | ✅ | Supported sampling parameters (e.g. temperature, top_p) |
supported_features | string[] | ✅ | Supported feature flags (see table below) |
deprecation_date | string | Planned deprecation date (ISO 8601, YYYY-MM-DD) | |
hugging_face_id | string | Hugging Face repository ID for open-weight models |
Pricing fields
All prices are USD strings. Token-class fields are priced per token; flat-class fields use the unit listed below.| Field | Type | Required | Unit | Description |
|---|---|---|---|---|
prompt | string | ✅ | USD / input token | Input token unit price |
completion | string | ✅ | USD / output token | Output token unit price |
request | string | ✅ | USD / request | Flat per-request fee (used by call-billed models) |
image | string | ✅ | USD / image | Per-image price (image generation models) |
audio | string | USD / audio | Per-audio-segment price | |
video | string | USD / video | Per-video-segment price | |
web_search | string | USD / query | Built-in web search tool fee | |
internal_reasoning | string | USD / reasoning token | Reasoning token price (o1 / Claude thinking) | |
input_cache_read | string | USD / token | Cache-hit read price | |
input_cache_write | string | USD / token | Cache write price (most expensive window is published) | |
min_context | int | tokens | Only valid inside pricing_tiers; minimum input tokens that trigger this tier |
- The four required fields (
prompt/completion/request/image) are always present, possibly as"0". Optional fields are omitted when not configured. - A value of
"0"may mean either “free” or “not applicable to this model”. Combine withsupported_featuresandsupported_sampling_parametersto infer capability instead of relying on pricing alone.
supported_features enum
| Value | Meaning |
|---|---|
tools | Function calling / tool use |
reasoning | Returns reasoning tokens (reasoning_content / thinking) |
json_mode | Supports JSON mode output |
structured_outputs | Supports schema-constrained structured outputs |
logprobs | Returns token log probabilities |
web_search | Built-in web search tool |
How the three billing modes map
The platform supports three internal billing modes (see Pricing). All three are losslessly mapped to OpenRouter fields:| Internal mode | Output field(s) | Notes |
|---|---|---|
| Per token | prompt / completion / internal_reasoning / input_cache_read / input_cache_write / web_search | Text and embedding models; per-million prices are converted to per-token strings |
| Per call | image / video / audio / request | Routed by the call unit: image generation → image, video → video, music/speech → audio, otherwise → request |
| Per second | request / video / audio | OpenRouter has no per_second field; the platform publishes per_second × typical_duration (default 30s) as the per-request price. Actual billing still uses real seconds, so reconciliation is exact. |
How the price is computed
The published price is:- base_price: the official price configured by the admin
- group_ratio: the ratio of the
defaultgroup - currency_factor: if the base is configured in CNY it is divided by the FX rate; if already USD it is left unchanged
- Not applied: agent markups, user tier discounts, per-user model multipliers, time-of-day rules, runtime dynamic factors
Tiered pricing
Some models charge a different unit price above a context-length threshold (e.g. Gemini 1.5 / 2.5 Pro doubles at ≥128K tokens). The upper tier is exposed viapricing_tiers:
pricingis the base tier (the Prompt Tier with the lowest threshold)pricing_tiers[i]is the upper tier; applied when total input tokens ≥min_context- At most one upper tier is returned (matching OpenRouter’s 2-tier maximum)
Data source: structured Prompt Tiers
The admin panel configures Prompt Tiers for each model. Each tier has two fields:| Field | Meaning |
|---|---|
min_prompt_tokens | Trigger threshold (incl. cache and multimodal tokens) |
price | Full PriceData for that tier (input/output/cache/multimodal unit prices) |
Matching dimension
The threshold is matched against total input tokens, including cache and multimodal:min_context and Gemini/Claude official tiered docs — total context length defines the tier; otherwise users could bypass tiered pricing by going fully through cache.
Matching rule
Largest-threshold-wins: among all tiers satisfyingtotal >= min, the one with the highest threshold is selected. The full price of the matched tier is applied (no progressive stepping, matching how official providers price).
If no tier is matched, the base tier (lowest threshold) is used.
Conditions that cannot be tiered
OpenRouter only models the “input token count” dimension. The following scenarios are still billed by the general rule engine (Conditions) but are not published viapricing_tiers — OpenRouter only sees the base price:
- Image quality (e.g.
request.body.quality == '1080p') - Video duration (e.g.
request.body.duration > 30) - Response-driven conditions (e.g.
response.body.usage.reasoning_tokens > 100) - Compound conditions (
&&/||combinations)
default group as the publishable rate.
Caching and update cadence
| Aspect | Behavior |
|---|---|
| Server cache | 60-second in-process cache; admin updates take effect within 1 minute |
| Client cache | Cache-Control: public, max-age=60; CDN and browser caches are safe to use |
| Recommended polling | At most once every 5 minutes; aggressive polling is rate-limited |
Comparison with other endpoints
| Endpoint | Purpose | Auth |
|---|---|---|
/v1/models/pricing | This endpoint: public pricing in OpenRouter format | Public |
/v1/models | OpenAI-SDK-compatible model id list | Token required |
/api/pricing | Front-end pricing page (user-aware) | Optional token |
FAQ
Why does the returned price differ from what I actually pay?
Why does the returned price differ from what I actually pay?
The published price reflects the
default group in an anonymous context. If you belong to a different group (e.g. pro / premium), have tier discounts, or access through an agent domain with markup applied, your real price will differ. Authenticated users should call /api/pricing to fetch their own effective price.Why do some fields return '0'?
Why do some fields return '0'?
"0" can mean either “free” or “this dimension does not apply to the model”. Always combine supported_features and supported_sampling_parameters to infer model capability — don’t rely on pricing alone.How are per-second models (TTS, Realtime, video) priced here?
How are per-second models (TTS, Realtime, video) priced here?
OpenRouter has no
per_second field, so the platform publishes per_second × typical_duration (admin-configured, default 30 seconds) as the request / video / audio price. Actual billing still uses real seconds, so OpenRouter’s monthly reconciliation always matches what we charged.Is there a rate limit?
Is there a rate limit?
Yes — a strict per-IP critical rate limit. Best practices:
- Poll no more than once every 5 minutes
- Honor
Cache-Control: max-age=60and cache client-side - For long-term monitoring, subscribe to platform announcements rather than polling aggressively
Can I filter by vendor or modality?
Can I filter by vendor or modality?
No query parameters are supported right now — the endpoint always returns all healthy models. Filter client-side using
supported_features / input_modalities after fetching.