Skip to content

DeepSeek

TL;DR

bash
ONGRID_DEEPSEEK_API_KEY=sk-...
ONGRID_DEEPSEEK_MODEL=deepseek-v4-flash     # default
ONGRID_DEEPSEEK_BASE_URL=                   # optional; defaults to api.deepseek.com/v1

Provider id: deepseek. SDK adapter: OpenAI-compatible.

DeepSeek's V4 family is the cheap-and-fast option. The endpoint is OpenAI-compatible at the wire level.

Env vars

VarDefaultNotes
ONGRID_DEEPSEEK_API_KEYEmpty = provider dropped
ONGRID_DEEPSEEK_MODELdeepseek-v4-flashDefault model
ONGRID_DEEPSEEK_BASE_URLhttps://api.deepseek.com/v1Override for VPC endpoints
ONGRID_DEEPSEEK_MODELSdeepseek-v4-pro,deepseek-v4-flash,deepseek-reasonerCatalog list

Default catalog

  • deepseek-v4-pro — top of the V4 family; closest to frontier quality at a fraction of the cost.
  • deepseek-v4-flash — the catalog default; recommended for chat.
  • deepseek-reasoner — chain-of-thought variant. See quirks below.

deepseek-reasoner caveats

deepseek-reasoner emits a <thinking>...</thinking> block before its final answer. The Ongrid LLM adapter does NOT strip these — they show up in the chat transcript and in the RCA report's findings_md.

If you don't want the thinking blocks rendered:

  1. Use a different model for chat (deepseek-v4-pro).
  2. Or post-process the transcript with a CSS rule that hides details[open] > summary:contains("thinking") — the SPA wraps them in collapsible <details> by default.

The reasoner's response is slower than v4-flash (the chain-of- thought is real compute). Don't use it for the Pass-2 structured extractor — the timeout will hit.

Making DeepSeek the default

bash
ONGRID_LLM_DEFAULT_PROVIDER=deepseek

The agent runtime auto-picks the default-resolver-provided model for the investigator persona's calls; this means flipping default to DeepSeek immediately routes all auto-RCAs there — at much lower cost than Claude / GPT for similar quality on the structured-extraction half of the pipeline.

BaseURL

The api.deepseek.com/v1 endpoint is globally reachable. No China-based tag in the SPA. Use BaseURL override only for relays.

Quirks

  • OpenAI-compatible wire — flat tool_calls, OpenAI streaming format. The adapter is the same as for Custom / Zhipu / Kimi / Gemini-OAI-mode.
  • Long context — V4 supports 64k tokens; the Ongrid budget estimator uses a conservative len(text)/4 so you'll see the budget reject before you actually hit the model limit.

See also