Production-Ready Generative AI, Real Cost Savings

Cut inference costs by up to 5x compared to closed generative LLMs.

Models

Production-grade inference, 99.9% Uptime

Choose the generative model to best match your use case.

DeepSeek V4 Flash

deepseek-ai/DeepSeek-V4-Flash
Context1M tokensMax output125K tokensModalitiestext

The lowest-cost option — a fast workhorse for high-volume coding and general-purpose tasks.

Qwen 3.6 35B-A3B

Qwen/Qwen3.6-35B-A3B
Context256K tokensMax output256K tokensModalitiestext + image

The cost-efficient pick for multimodal use cases — image and text understanding at low cost.

Kimi K2.6

moonshotai/Kimi-K2.6
Context256K tokensMax output256K tokensModalitiestext + image

Built for agentic workflows — strong tool use and multi-step coding on complex tasks.

Pricing

Per-token, in USD

You pay for input and output tokens reported by the upstream model. Failed requests and mid-stream cancellations are not billed.

ModelInputOutput
DeepSeek V4 Flash
deepseek-ai/DeepSeek-V4-Flash
$0.10
/ 1M tok
$0.28
/ 1M tok
Qwen 3.6 35B-A3B
Qwen/Qwen3.6-35B-A3B
$0.15
/ 1M tok
$1.00
/ 1M tok
Kimi K2.6
moonshotai/Kimi-K2.6
$0.70
/ 1M tok
$4.00
/ 1M tok

Get started

Deploy trusted models in minutes

1. Create an API key

Sign in at classer.ai/api-keys and click Create API Key. Logging is off by default.

2. Swap the base URL

Point your existing OpenAI SDK at https://api.classer.ai/v1 with your Classer key.

3. Call any model

Set the model field to the id from the table above. Tool calling, streaming, structured outputs all work out of the box.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.classer.ai/v1",
    api_key="$CLASSER_API_KEY",
)

resp = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V4-Flash",
    messages=[{"role": "user", "content": "Hello, world!"}],
)