compresh / docs
Pricing Sign In
FAQ

FAQ

Common questions about Compresh, how it works, and what to expect.

What is Compresh?

Compresh is context compression middleware for LLM APIs. It sits between your application and your LLM provider, compressing conversation context to reduce token usage. One line to integrate — change your base URL, keep everything else the same.

How does compression work?

Compresh uses EMA (Episodic Memory Architecture) to extract semantic tags from conversations. As a conversation deepens, Compresh identifies redundant context — information that's been established and repeated — and compresses it while preserving meaning. The result: deeper conversations see dramatically more savings. Flat, single-turn requests pass through with minimal processing.

Will it affect response quality?

Benchmarks show less than 2% quality variance with 60–80% token savings on Turn 10+. Short conversations (under 5 turns) see minimal compression — Compresh intentionally avoids aggressive compression when there isn't enough context to safely compress. Quality is the constraint, not savings.

Is my data safe?

Yes. API keys are encrypted with Fernet (SHA-256 derived key). No conversation content is stored permanently — Compresh processes context in-flight. Semantic tag clouds (used for compression) auto-expire with a TTL of 7 days. Compresh never logs message content.

Can I use free or local models?

Yes. Free models and local LLMs (Ollama, LM Studio, etc.) work through Compresh. When you use a free or local model, there is no savings-share deduction — Compresh stays out of the way. The Starter tier requires a $10 minimum top-up to start (paid at $7.50 with the 25% discount), and a $5 minimum balance must remain on the account to keep free-model access active.

What's the pricing?

Starter: $0 service fee. $10 minimum top-up to start, no card on file. The first 100 users get an extra $30 in credit. Paid models on Starter: 30% savings-share. Pro: Subscription plans — Quarterly $18 (20% share), Semi-annual $33 (16% share), Annual $60 (12% share). Pro pays itself off if you use paid models heavily. See Pricing for full details.

What if Compresh doesn't recognize my model as free or paid?

Compresh detects model pricing through provider responses. Local model usage is auto-detected (no provider call goes out). For new or rare models we don't have pricing data on yet, we fall back to OpenRouter's lowest published tariff as a conservative default. If you believe a model is misclassified, you can dispute via support — we run a quick check and adjust.

Can I self-host?

The core proxy is open-source. You can fork the repository, configure your own keys, and deploy anywhere. Self-hosting means you handle infrastructure, updates, and scaling yourself.

Which models are supported?

Any OpenAI-compatible model, all Anthropic Claude models, and 200+ models via OpenRouter. If your provider exposes an OpenAI-compatible API, it works with Compresh. See Integrations for setup guides.

What about injection attacks?

Compresh includes a 3-layer injection detection system: regex pattern matching, heuristic analysis, and ML-based classification. It supports 19 languages out of the box. Injection detection runs on every request before it's forwarded to your provider, adding a security layer that most direct API integrations lack.