Rate Limits

1. What are Rate Limits?

Rate Limits refers to the API that limits the number of times a user can access the server or the number of tokens consumed within a specific period of time. API rate limiting on current platforms takes several forms:

RPM (Requests Per Minute, the maximum number of requests initiated in one minute)
RPH (Requests Per Hour, the maximum number of requests allowed per hour)
RPD (Requests Per Day, the maximum number of requests allowed per day)
TPM (Tokens Per Minute) is the total number of tokens consumed per minute, including input and output.
TPD (Tokens Per Day, the maximum number of tokens allowed per day)
IPM (Images Per Minute, the maximum number of images generated in one minute)
IPD (Images Per Day, the maximum number of images generated in one day)

2. Why are Rate Limits needed?

Enforcing rate limits is a common mechanism in API services for purposes including:

Prevent the interface from being abused maliciously, such as preventing a large number of invalid accesses in a short period of time from causing performance degradation or even service unavailability.
Ensure that the rights and interests of all users are not infringed upon by a few high-frequency users, so that the API can maintain fair distribution of access resources and avoid excessive consumption by some users that affects the experience of others.
Help this site provide a consistent and efficient service experience for all users.

3. Specific parameters of API rate limit

Implement a unified rate policy for account orchestration based on the model used, interface category, and account type. For example: If the account has 120 RPM, this account can send up to 120 requests per minute. If 30 times have been used, 90 times remain available for other requests. Restrictions for each interface are detailed as follows: There is currently no limit

4. Response instructions when the rate exceeds the limit

When the number of requests or token usage reaches the upper limit in a short period of time, the API will return a rate-limited error message. At this time, subsequent requests will be temporarily rejected, and you can continue to access the interface after the cooling time has expired.

HTTP/1.1 429
Too Many Requests
Content Type: application/json
The current group upstream load is saturated, please try again later.

5. Rate limit and tokens_to_generate, max_tokens

Since the total number of input and output tokens cannot be accurately known during the request, the system will use the interface parameters tomax_tokensEstimate the token used for the request and perform TPM current limiting accordingly. After actual generation, it will be corrected with the actual number of tokens. Recommended settingsmax_tokensThe value should be as close to the actual needs as possible to minimize errors caused by exceeding the limit.

6. Optimization suggestions under speed limit strategy

Since the API will control the number of requests and the total amount of tokens separately, it is recommended that you merge batch requests. When the number of requests reaches the upper limit but there is still room for tokens, multiple tasks can be merged into one request to improve token processing efficiency.

7. How to increase the rate limit

Default limits are designed to maintain API stability and fair resource allocation. If you need a higher rate, or separate supply, you can contact the business manager to apply.

Getting Started

Models & Services

Usage Guide

1. What are Rate Limits?

2. Why are Rate Limits needed?

3. Specific parameters of API rate limit

4. Response instructions when the rate exceeds the limit

5. Rate limit and tokens_to_generate, max_tokens

6. Optimization suggestions under speed limit strategy

7. How to increase the rate limit

Getting Started

Models & Services

Usage Guide

​1. What are Rate Limits?

​2. Why are Rate Limits needed?

​3. Specific parameters of API rate limit

​4. Response instructions when the rate exceeds the limit

​5. Rate limit and tokens_to_generate, max_tokens

​6. Optimization suggestions under speed limit strategy

​7. How to increase the rate limit

1. What are Rate Limits?

2. Why are Rate Limits needed?

3. Specific parameters of API rate limit

4. Response instructions when the rate exceeds the limit

5. Rate limit and tokens_to_generate, max_tokens

6. Optimization suggestions under speed limit strategy

7. How to increase the rate limit