Rate Limits — Compresh Docs

Compresh applies rate limits per API key to ensure fair usage and protect infrastructure. These limits are independent of any rate limits your upstream provider applies.

Default limits

Plan	Requests per minute
Free / Pro	2,000

Limits are applied per API key, not per account. If you have multiple keys, each gets its own quota.

Rate limit headers

Every response includes these headers so you can track your usage:

Header	Type	Description
`X-RateLimit-Limit`	integer	Maximum requests per minute for your key
`X-RateLimit-Remaining`	integer	Requests remaining in the current window
`X-RateLimit-Reset`	integer	Unix timestamp (seconds) when the rate limit window resets

When you hit the limit

If you exceed the rate limit, Compresh returns:

HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 2000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1714500000

{
  "error": {
    "message": "Rate limit exceeded. Retry after 2026-04-30T12:00:00Z.",
    "type": "rate_limit_error",
    "code": "rate_limit_exceeded"
  }
}

Best practices

Implement exponential backoff with jitter when you receive a 429.
Check X-RateLimit-Remaining proactively to throttle before hitting the limit.
Use separate API keys for different services or environments to isolate quotas.

Tip

Compresh rate limits and your upstream provider's rate limits are completely independent. A 429 from Compresh means you've hit the proxy limit — your provider may still have capacity. Conversely, a provider 429 passes through even if you're within Compresh limits.

Example: checking limits in code

import openai

client = openai.OpenAI(
    api_key="comp_your_key",
    base_url="https://api.compre.sh/v1"
)

response = client.chat.completions.with_raw_response.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}]
)

remaining = response.headers.get("X-RateLimit-Remaining")
print(f"Requests remaining: ${remaining}")