Pomelo

Rate Limits

To ensure a reliable experience for all users, our API implements rate limiting. These limits are designed to prevent abuse and ensure the stability and performance of our services.

Request Limits

Rate limits are applied on a per-API-key basis. Different subscription tiers have different rate limits:

PlanRequests per minuteRequests per dayConcurrent requests
Free101,0002
Pro6010,0005
Business300100,00025
EnterpriseCustomCustomCustom

Note: For streaming requests (e.g., chat completions with streaming enabled), each chunk of the stream counts as a single request toward your rate limit.

Model-Specific Limits

Some models may have additional rate limits due to their computational requirements:

ModelRequests per minuteTokens per minute
GPT-41010,000
GPT-3.5 Turbo2040,000
Claude 21015,000
Claude Instant2040,000

Monitoring Your Usage

You can monitor your current rate limit usage through the response headers:

HeaderDescription
x-ratelimit-limitThe maximum number of requests allowed within a time window
x-ratelimit-remainingThe number of remaining requests in the current time window
x-ratelimit-resetThe time at which the current rate limit window resets (UTC epoch seconds)

Example Response Headers

HTTP/1.1 200 OK
Content-Type: application/json
x-ratelimit-limit: 60
x-ratelimit-remaining: 59
x-ratelimit-reset: 1623869903

Handling Rate Limits

When you exceed your rate limit, the API will return a 429 Too Many Requests response code. The response will include a Retry-After header indicating how long to wait before making another request.

Example Rate Limit Error

HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 30

{
  "error": {
    "type": "rate_limit_error",
    "message": "You have exceeded your request rate limit. Please try again after 30 seconds.",
    "code": "rate_limit_exceeded",
    "status": 429
  }
}

Best Practices for Handling Rate Limits

  1. Implement exponential backoff: When encountering rate limit errors, use exponential backoff to retry requests after increasingly longer intervals.
  2. Respect the Retry-After header: Always use the value of this header to determine when to retry a request rather than using fixed intervals.
  3. Cache responses: Where appropriate, cache API responses to reduce the number of API calls you need to make.
  4. Batch requests: Consider batching multiple operations into a single API call when possible.
  5. Monitor your usage: Regularly check your usage in the dashboard to anticipate when you might need to upgrade your plan.
# Python example for handling rate limits with exponential backoff
import requests
import time
import random

def make_api_request_with_backoff(endpoint, data, max_retries=5):
    url = f"https://api.pomeloapi.example.com{endpoint}"
    headers = {
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    }
    
    retries = 0
    while retries <= max_retries:
        try:
            response = requests.post(url, headers=headers, json=data)
            
            if response.status_code == 200:
                # Success
                return response.json()
            
            if response.status_code == 429:
                # Rate limited
                retry_after = int(response.headers.get('Retry-After', 1))
                
                # Add jitter to avoid thundering herd problem
                sleep_time = retry_after + (random.random() * 2)
                
                print(f"Rate limited. Retrying after {sleep_time} seconds.")
                time.sleep(sleep_time)
                retries += 1
                continue
            
            # Other error
            response.raise_for_status()
            
        except requests.exceptions.RequestException as e:
            print(f"Request error: {e}")
            retries += 1
            if retries <= max_retries:
                # Exponential backoff
                sleep_time = (2 ** retries) + random.random()
                print(f"Retrying in {sleep_time} seconds...")
                time.sleep(sleep_time)
            else:
                raise
    
    return None

Need Higher Limits?

If your use case requires higher rate limits than what's available on our standard plans, please contact us to discuss custom Enterprise options.