DeepSeek
TL;DR
ONGRID_DEEPSEEK_API_KEY=sk-...
ONGRID_DEEPSEEK_MODEL=deepseek-v4-flash # default
ONGRID_DEEPSEEK_BASE_URL= # optional; defaults to api.deepseek.com/v1Provider id: deepseek. SDK adapter: OpenAI-compatible.
DeepSeek's V4 family is the cheap-and-fast option. The endpoint is OpenAI-compatible at the wire level.
Env vars
| Var | Default | Notes |
|---|---|---|
ONGRID_DEEPSEEK_API_KEY | — | Empty = provider dropped |
ONGRID_DEEPSEEK_MODEL | deepseek-v4-flash | Default model |
ONGRID_DEEPSEEK_BASE_URL | https://api.deepseek.com/v1 | Override for VPC endpoints |
ONGRID_DEEPSEEK_MODELS | deepseek-v4-pro,deepseek-v4-flash,deepseek-reasoner | Catalog list |
Default catalog
deepseek-v4-pro— top of the V4 family; closest to frontier quality at a fraction of the cost.deepseek-v4-flash— the catalog default; recommended for chat.deepseek-reasoner— chain-of-thought variant. See quirks below.
deepseek-reasoner caveats
deepseek-reasoner emits a <thinking>...</thinking> block before its final answer. The Ongrid LLM adapter does NOT strip these — they show up in the chat transcript and in the RCA report's findings_md.
If you don't want the thinking blocks rendered:
- Use a different model for chat (
deepseek-v4-pro). - Or post-process the transcript with a CSS rule that hides
details[open] > summary:contains("thinking")— the SPA wraps them in collapsible<details>by default.
The reasoner's response is slower than v4-flash (the chain-of- thought is real compute). Don't use it for the Pass-2 structured extractor — the timeout will hit.
Making DeepSeek the default
ONGRID_LLM_DEFAULT_PROVIDER=deepseekThe agent runtime auto-picks the default-resolver-provided model for the investigator persona's calls; this means flipping default to DeepSeek immediately routes all auto-RCAs there — at much lower cost than Claude / GPT for similar quality on the structured-extraction half of the pipeline.
BaseURL
The api.deepseek.com/v1 endpoint is globally reachable. No China-based tag in the SPA. Use BaseURL override only for relays.
Quirks
- OpenAI-compatible wire — flat
tool_calls, OpenAI streaming format. The adapter is the same as for Custom / Zhipu / Kimi / Gemini-OAI-mode. - Long context — V4 supports 64k tokens; the Ongrid budget estimator uses a conservative
len(text)/4so you'll see the budget reject before you actually hit the model limit.
See also
- Models overview.
- Routing.
- Budget — the per-day token cap that bounds total cost across providers.