compresh / docs
Pricing Sign In
API Reference Rate Limits

Rate Limits

Compresh applies rate limits per API key to ensure fair usage and protect infrastructure. These limits are independent of any rate limits your upstream provider applies.

Default limits

Plan Requests per minute
Free / Pro 2,000

Limits are applied per API key, not per account. If you have multiple keys, each gets its own quota.

Rate limit headers

Every response includes these headers so you can track your usage:

Header Type Description
X-RateLimit-Limit integer Maximum requests per minute for your key
X-RateLimit-Remaining integer Requests remaining in the current window
X-RateLimit-Reset integer Unix timestamp (seconds) when the rate limit window resets

When you hit the limit

If you exceed the rate limit, Compresh returns:

HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 2000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1714500000

{
  "error": {
    "message": "Rate limit exceeded. Retry after 2026-04-30T12:00:00Z.",
    "type": "rate_limit_error",
    "code": "rate_limit_exceeded"
  }
}

Best practices

  • Implement exponential backoff with jitter when you receive a 429.
  • Check X-RateLimit-Remaining proactively to throttle before hitting the limit.
  • Use separate API keys for different services or environments to isolate quotas.
Tip

Compresh rate limits and your upstream provider's rate limits are completely independent. A 429 from Compresh means you've hit the proxy limit — your provider may still have capacity. Conversely, a provider 429 passes through even if you're within Compresh limits.

Example: checking limits in code

import openai

client = openai.OpenAI(
    api_key="comp_your_key",
    base_url="https://api.compre.sh/v1"
)

response = client.chat.completions.with_raw_response.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}]
)

remaining = response.headers.get("X-RateLimit-Remaining")
print(f"Requests remaining: ${remaining}")