Budget & limits
Ongrid enforces a global per-UTC-day token cap across every provider. The default is unlimited; one env var turns it on:
ONGRID_LLM_DAILY_TOKEN_LIMIT=2000000 # 2 million tokens per UTC day<=0 disables the cap. Single value, not per-provider — this is the MVP scope. When tenants land it moves to per-org settings; this knob stays as a safety-net global cap.
How it's wired
Three pieces, in internal/pkg/llm/:
// 1. The interface
type BudgetChecker interface {
Check(ctx context.Context, userID uint64, estPromptTokens int) error
Record(ctx context.Context, userID uint64, usage Usage) error
}
// 2. The MVP implementation
budget := llm.NewInMemoryBudget(cfg.LLM.DailyTokenLimit)
// 3. The eino callback that bridges to the graph kernel
handler := llm.NewBudgetCallbackHandler(budget, userID)The graph-kernel runtime installs the callback handler in its eino callbacks chain. On every ChatModel OnStart:
- Estimate prompt tokens:
len(text)/4(conservative). BudgetChecker.Check(ctx, userID, estPromptTokens).- On rejection — store
ErrBudgetExceededin the context so the downstream node can short-circuit; subsequent code surfaces it.
On OnEnd, the actual Usage.TotalTokens is recorded against the current UTC-day bucket.
ErrBudgetExceeded
// internal/pkg/llm/budget.go:37
func (b *InMemoryBudget) Check(ctx context.Context, userID uint64, estPromptTokens int) error {
if b.dailyLimit <= 0 {
return nil
}
b.mu.Lock()
defer b.mu.Unlock()
key := b.dayKey()
if b.used[key]+estPromptTokens > b.dailyLimit {
return ErrBudgetExceeded
}
return nil
}The error propagates to:
- The chat send endpoint — returns HTTP 429 with a
{ "error": "budget_exceeded", "message": "..." }body the chat UI renders in-line. - The RCA investigator worker — the report row lands as
status=failedwithstatus_reason="budget_exceeded". - The translate path — falls back to "translation unavailable (budget exceeded)" and the original text is shown.
InMemoryBudget caveats
The MVP implementation is in-memory:
type InMemoryBudget struct {
mu sync.Mutex
dailyLimit int // tokens per UTC day; <=0 means unlimited
used map[string]int // key = "YYYY-MM-DD" (UTC)
now func() time.Time
}Consequences:
- No persistence — a manager restart resets the day's counter. If you actually want a hard daily cap that survives restarts, swap the implementation. The
BudgetCheckerinterface is the seam. - Single-process — if you run multiple managers behind a load balancer (you shouldn't yet, but if), each has its own counter.
- Global, not per-user —
userIDflows through the interface so a future MySQLusage_dailytable is a drop-in, but today the cap is the same number for everyone.
The pivot to single-tenant deferred the per-user backend; the interface is forward-compatible for when multi-user comes back.
Token estimation
BudgetCallbackHandler.OnStart estimates prompt tokens by character count / 4. This is intentionally conservative — real tokenisation varies by provider / model, and the budget is supposed to err on the side of refusing borderline calls rather than going over.
On OnEnd, the actual Usage.TotalTokens returned by the provider is recorded — so the budget tracks ground truth even when the estimate was off.
If the provider doesn't return token counts (some custom endpoints don't), the callback falls back to a response-meta heuristic; see OnEndUsesResponseMetaFallback in the tests.
Observing the budget
curl -s localhost:9100/metrics | grep llm_budget
# llm_budget_daily_limit_tokens 2000000
# llm_budget_used_tokens_today 412847
# llm_budget_rejections_total 3The metrics are wired by BudgetCallbackHandler.Stats(). The self-obs Prom dashboard renders them as a daily-spend graph plus an alert at 80% of the cap.
Disabling for one workload
There's no "disable budget for the investigator" knob. If RCA is hitting the cap and you'd rather it kept running than chat, raise the cap — that's what it's there for. The alternative (per-workload quotas) is parked along with multi-tenancy.
See also
- Models overview.
- Routing — orthogonal to budget; the cap applies per-call regardless of which provider was picked.
- Environment variables —
ONGRID_LLM_*knobs.