Compresh applies rate limits per API key to ensure fair usage and protect infrastructure. These limits are independent of any rate limits your upstream provider applies.
Default limits
| Plan | Requests per minute |
|---|---|
| Free / Pro | 2,000 |
Limits are applied per API key, not per account. If you have multiple keys, each gets its own quota.
Rate limit headers
Every response includes these headers so you can track your usage:
| Header | Type | Description |
|---|---|---|
X-RateLimit-Limit | integer | Maximum requests per minute for your key |
X-RateLimit-Remaining | integer | Requests remaining in the current window |
X-RateLimit-Reset | integer | Unix timestamp (seconds) when the rate limit window resets |
When you hit the limit
If you exceed the rate limit, Compresh returns:
HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 2000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1714500000
{
"error": {
"message": "Rate limit exceeded. Retry after 2026-04-30T12:00:00Z.",
"type": "rate_limit_error",
"code": "rate_limit_exceeded"
}
} Best practices
- Implement exponential backoff with jitter when you receive a 429.
- Check
X-RateLimit-Remainingproactively to throttle before hitting the limit. - Use separate API keys for different services or environments to isolate quotas.
Tip
Compresh rate limits and your upstream provider's rate limits are completely independent. A 429 from Compresh means you've hit the proxy limit — your provider may still have capacity. Conversely, a provider 429 passes through even if you're within Compresh limits.
Example: checking limits in code
import openai
client = openai.OpenAI(
api_key="comp_your_key",
base_url="https://api.compre.sh/v1"
)
response = client.chat.completions.with_raw_response.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}]
)
remaining = response.headers.get("X-RateLimit-Remaining")
print(f"Requests remaining: ${remaining}")