Production-Ready Generative AI, Real Cost Savings
Cut inference costs by up to 5x compared to closed generative LLMs.
Models
Production-grade inference, 99.9% Uptime
Choose the generative model to best match your use case.
DeepSeek V4 Flash
deepseek-ai/DeepSeek-V4-FlashThe lowest-cost option — a fast workhorse for high-volume coding and general-purpose tasks.
Qwen 3.6 35B-A3B
Qwen/Qwen3.6-35B-A3BThe cost-efficient pick for multimodal use cases — image and text understanding at low cost.
Kimi K2.6
moonshotai/Kimi-K2.6Built for agentic workflows — strong tool use and multi-step coding on complex tasks.
Pricing
Per-token, in USD
You pay for input and output tokens reported by the upstream model. Failed requests and mid-stream cancellations are not billed.
| Model | Input | Cache | Output |
|---|---|---|---|
DeepSeek V4 Flash deepseek-ai/DeepSeek-V4-Flash | $0.10 / 1M tok | — | $0.28 / 1M tok |
Qwen 3.6 35B-A3B Qwen/Qwen3.6-35B-A3B | $0.15 / 1M tok | $0.05 / 1M tok | $1.00 / 1M tok |
Kimi K2.6 moonshotai/Kimi-K2.6 | $0.70 / 1M tok | — | $4.00 / 1M tok |
Get started
Deploy trusted models in minutes
1. Create an API key
Sign in at classer.ai/api-keys and click Create API Key. Logging is off by default.
2. Swap the base URL
Point your existing OpenAI SDK at https://api.classer.ai/v1 with your Classer key.
3. Call any model
Set the model field to the id from the table above. Tool calling, streaming, structured outputs all work out of the box.
from openai import OpenAI
client = OpenAI(
base_url="https://api.classer.ai/v1",
api_key="$CLASSER_API_KEY",
)
resp = client.chat.completions.create(
model="deepseek-ai/DeepSeek-V4-Flash",
messages=[{"role": "user", "content": "Hello, world!"}],
)