Production-Ready Generative AI, Real Cost Savings

Cut inference costs by up to 5x compared to closed generative LLMs.

Get an API key Read the docs

Models

Production-grade inference, 99.9% Uptime

Choose the generative model to best match your use case.

DeepSeek V4 Flash

deepseek-ai/DeepSeek-V4-Flash

Context1M tokensMax output125K tokensModalitiestext

The lowest-cost option — a fast workhorse for high-volume coding and general-purpose tasks.

Qwen 3.6 35B-A3B

Qwen/Qwen3.6-35B-A3B

Context256K tokensMax output256K tokensModalitiestext + image

The cost-efficient pick for multimodal use cases — image and text understanding at low cost.

Kimi K2.6

moonshotai/Kimi-K2.6

Context256K tokensMax output256K tokensModalitiestext + image

Built for agentic workflows — strong tool use and multi-step coding on complex tasks.

Pricing

Per-token, in USD

You pay for input and output tokens reported by the upstream model. Failed requests and mid-stream cancellations are not billed.

Model	Input	Cache	Output
DeepSeek V4 Flash `deepseek-ai/DeepSeek-V4-Flash`	$0.10 / 1M tok	—	$0.28 / 1M tok
Qwen 3.6 35B-A3B `Qwen/Qwen3.6-35B-A3B`	$0.15 / 1M tok	$0.05 / 1M tok	$1.00 / 1M tok
Kimi K2.6 `moonshotai/Kimi-K2.6`	$0.70 / 1M tok	—	$4.00 / 1M tok

Get started

Deploy trusted models in minutes

1. Create an API key

2. Swap the base URL

Point your existing OpenAI SDK at https://api.classer.ai/v1 with your Classer key.

3. Call any model

Set the model field to the id from the table above. Tool calling, streaming, structured outputs all work out of the box.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.classer.ai/v1",
    api_key="$CLASSER_API_KEY",
)

resp = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V4-Flash",
    messages=[{"role": "user", "content": "Hello, world!"}],
)

Full API reference