Rate Limits
To ensure a reliable experience for all users, our API implements rate limiting. These limits are designed to prevent abuse and ensure the stability and performance of our services.
Request Limits
Rate limits are applied on a per-API-key basis. Different subscription tiers have different rate limits:
Plan | Requests per minute | Requests per day | Concurrent requests |
---|---|---|---|
Free | 10 | 1,000 | 2 |
Pro | 60 | 10,000 | 5 |
Business | 300 | 100,000 | 25 |
Enterprise | Custom | Custom | Custom |
Note: For streaming requests (e.g., chat completions with streaming enabled), each chunk of the stream counts as a single request toward your rate limit.
Model-Specific Limits
Some models may have additional rate limits due to their computational requirements:
Model | Requests per minute | Tokens per minute |
---|---|---|
GPT-4 | 10 | 10,000 |
GPT-3.5 Turbo | 20 | 40,000 |
Claude 2 | 10 | 15,000 |
Claude Instant | 20 | 40,000 |
Monitoring Your Usage
You can monitor your current rate limit usage through the response headers:
Header | Description |
---|---|
x-ratelimit-limit | The maximum number of requests allowed within a time window |
x-ratelimit-remaining | The number of remaining requests in the current time window |
x-ratelimit-reset | The time at which the current rate limit window resets (UTC epoch seconds) |
Example Response Headers
HTTP/1.1 200 OK Content-Type: application/json x-ratelimit-limit: 60 x-ratelimit-remaining: 59 x-ratelimit-reset: 1623869903
Handling Rate Limits
When you exceed your rate limit, the API will return a 429 Too Many Requests response code. The response will include a Retry-After header indicating how long to wait before making another request.
Example Rate Limit Error
HTTP/1.1 429 Too Many Requests Content-Type: application/json Retry-After: 30 { "error": { "type": "rate_limit_error", "message": "You have exceeded your request rate limit. Please try again after 30 seconds.", "code": "rate_limit_exceeded", "status": 429 } }
Best Practices for Handling Rate Limits
- Implement exponential backoff: When encountering rate limit errors, use exponential backoff to retry requests after increasingly longer intervals.
- Respect the Retry-After header: Always use the value of this header to determine when to retry a request rather than using fixed intervals.
- Cache responses: Where appropriate, cache API responses to reduce the number of API calls you need to make.
- Batch requests: Consider batching multiple operations into a single API call when possible.
- Monitor your usage: Regularly check your usage in the dashboard to anticipate when you might need to upgrade your plan.
# Python example for handling rate limits with exponential backoff import requests import time import random def make_api_request_with_backoff(endpoint, data, max_retries=5): url = f"https://api.pomeloapi.example.com{endpoint}" headers = { "Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json" } retries = 0 while retries <= max_retries: try: response = requests.post(url, headers=headers, json=data) if response.status_code == 200: # Success return response.json() if response.status_code == 429: # Rate limited retry_after = int(response.headers.get('Retry-After', 1)) # Add jitter to avoid thundering herd problem sleep_time = retry_after + (random.random() * 2) print(f"Rate limited. Retrying after {sleep_time} seconds.") time.sleep(sleep_time) retries += 1 continue # Other error response.raise_for_status() except requests.exceptions.RequestException as e: print(f"Request error: {e}") retries += 1 if retries <= max_retries: # Exponential backoff sleep_time = (2 ** retries) + random.random() print(f"Retrying in {sleep_time} seconds...") time.sleep(sleep_time) else: raise return None
Need Higher Limits?
If your use case requires higher rate limits than what's available on our standard plans, please contact us to discuss custom Enterprise options.