API Documentation | Pomelo

To ensure a reliable experience for all users, our API implements rate limiting. These limits are designed to prevent abuse and ensure the stability and performance of our services.

Request Limits

Rate limits are applied on a per-API-key basis. Different subscription tiers have different rate limits:

Plan	Requests per minute	Requests per day	Concurrent requests
Free	10	1,000	2
Pro	60	10,000	5
Business	300	100,000	25
Enterprise	Custom	Custom	Custom

Note: For streaming requests (e.g., chat completions with streaming enabled), each chunk of the stream counts as a single request toward your rate limit.

Model-Specific Limits

Some models may have additional rate limits due to their computational requirements:

Model	Requests per minute	Tokens per minute
GPT-4	10	10,000
GPT-3.5 Turbo	20	40,000
Claude 2	10	15,000
Claude Instant	20	40,000

Monitoring Your Usage

You can monitor your current rate limit usage through the response headers:

Header	Description
x-ratelimit-limit	The maximum number of requests allowed within a time window
x-ratelimit-remaining	The number of remaining requests in the current time window
x-ratelimit-reset	The time at which the current rate limit window resets (UTC epoch seconds)

Example Response Headers

HTTP/1.1 200 OK
Content-Type: application/json
x-ratelimit-limit: 60
x-ratelimit-remaining: 59
x-ratelimit-reset: 1623869903

Handling Rate Limits

When you exceed your rate limit, the API will return a 429 Too Many Requests response code. The response will include a Retry-After header indicating how long to wait before making another request.

Example Rate Limit Error

HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 30

{
  "error": {
    "type": "rate_limit_error",
    "message": "You have exceeded your request rate limit. Please try again after 30 seconds.",
    "code": "rate_limit_exceeded",
    "status": 429
  }
}

Best Practices for Handling Rate Limits

Implement exponential backoff: When encountering rate limit errors, use exponential backoff to retry requests after increasingly longer intervals.
Respect the Retry-After header: Always use the value of this header to determine when to retry a request rather than using fixed intervals.
Cache responses: Where appropriate, cache API responses to reduce the number of API calls you need to make.
Batch requests: Consider batching multiple operations into a single API call when possible.
Monitor your usage: Regularly check your usage in the dashboard to anticipate when you might need to upgrade your plan.

# Python example for handling rate limits with exponential backoff
import requests
import time
import random

def make_api_request_with_backoff(endpoint, data, max_retries=5):
    url = f"https://api.pomeloapi.example.com{endpoint}"
    headers = {
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    }
    
    retries = 0
    while retries <= max_retries:
        try:
            response = requests.post(url, headers=headers, json=data)
            
            if response.status_code == 200:
                # Success
                return response.json()
            
            if response.status_code == 429:
                # Rate limited
                retry_after = int(response.headers.get('Retry-After', 1))
                
                # Add jitter to avoid thundering herd problem
                sleep_time = retry_after + (random.random() * 2)
                
                print(f"Rate limited. Retrying after {sleep_time} seconds.")
                time.sleep(sleep_time)
                retries += 1
                continue
            
            # Other error
            response.raise_for_status()
            
        except requests.exceptions.RequestException as e:
            print(f"Request error: {e}")
            retries += 1
            if retries <= max_retries:
                # Exponential backoff
                sleep_time = (2 ** retries) + random.random()
                print(f"Retrying in {sleep_time} seconds...")
                time.sleep(sleep_time)
            else:
                raise
    
    return None

Need Higher Limits?

If your use case requires higher rate limits than what's available on our standard plans, please contact us to discuss custom Enterprise options.

Learn About Enterprise Plans →