Glossary
Project-specific terms used across the docs and source. If a term has a longer treatment, the dedicated page is linked.
A
Agent
In Ongrid, an agent is a configured LLM worker with a specific job (coordinator, incident investigator, network specialist, etc.). Each agent has a persona describing which model it runs, which tools it can call, and how many ReAct turns it gets. Distinct from "edge agent" — see Edge.
Agent kernel
The runtime that drives an agent's ReAct loop: prompt assembly, tool registry resolution, model invocation, tool execution, response parsing. Two kernels exist: graph (default; built on eino) and legacy (for-loop). Toggle via ONGRID_AGENT_KERNEL.
Air-gapped
A deployment with no internet egress. Ongrid runs fully offline given a local LLM relay (e.g. vLLM, Ollama) and the local embedding model bundled in the release tarball. See Air-gapped / on-prem.
B
Blast radius
The set of services or hosts affected by a change or incident. The agent computes this by walking the topology graph (downstream nodes) before recommending any destructive action.
Bounded context (BC)
A subdomain of the manager with its own model, biz, data, and server packages. Examples: iam, edge, device, alert, aiops. Imports across BCs are restricted to interface ports; go-arch-lint enforces the boundary.
Built-in vault
The default knowledge-base content bundled at github.com/ongridio/vault. The manager syncs it on first boot and re-syncs on demand. Public repo; about 96 markdown playbooks.
C
Channel
A delivery target for notifications. Types: webhook, slack, feishu, dingtalk, wecom, telegram. Each channel can be filtered by severity and scope. See Channels.
Class (tool class)
Blast-radius classification of a tool: safe (read-only), mutating (reversible write), dangerous (irreversible). The persona's permission_mode field gates which classes are callable. See Skill manifest.
Cmdpolicy
The edge-side sandbox that gates bash skill invocations. Defines a binary whitelist, argument matchers, path allowlists, and network allowlists. Located under internal/edgeagent/cmdpolicy. Read-only by default — flipping to mutating requires explicit policy edits.
Control plane
The geminio tunnel. Carries edge lifecycle, RPC, heartbeats, alert events, and (today) metric push. See Data plane.
Coordinator
The top-level agent that decomposes user questions, dispatches specialist sub-agents, and assembles the final answer. Persona name: coordinator. See Coordinator.
D
Data plane
The independent outbound HTTPS path edges use to ship logs and traces directly to the manager's public ingest endpoints — distinct from the tunnel (control plane). See Telemetry data plane.
Dedupe key
Per-rule, per-scope key that the alert evaluator uses to collapse repeated firings into one open incident. Built from rule_key + scope_type + scope identifier + (optional rule-specific dimensions). Unique index on alert_incidents.dedupe_key.
Device
A logical host. Distinct from edge, which is the tunnel-connected agent process. One device may have zero or many edges (if reinstalled). The device_id is the canonical join key in PromQL labels and topology nodes.
E
Edge
The tunnel-connected agent process — ongrid-edge. Identified by edge_id. One edge per running agent process; one device may host multiple edges over its lifetime (after reinstalls). Display rule: the Edges page lists agents; the Devices page lists hosts.
Edge bundle
The tarball the manager ships to an edge for whole-bundle upgrade (ADR-024). Contains the agent binary plus every plugin binary, all for the target arch. Staged in /var/lib/ongrid-edge/.upgrade/ and swapped on next boot.
eino
The graph kernel library Ongrid uses for the agent's ReAct loop (ONGRID_AGENT_KERNEL=graph). Provides the prompt-assembly, tool-call, and graph-execution primitives.
F
Frontier
The upstream geminio broker (github.com/singchia/frontier, ADR-007). Edge dials port 40012; manager dials port 40011 over the docker network. Ships as a docker image bundled in the release tarball.
G
geminio
The TLS-based multi-stream tunnel protocol (github.com/singchia/geminio) Ongrid's frontier broker implements. Supports request/response RPC + raw streams over one persistent TLS connection.
Grafana embed
The Monitor page renders Grafana panels via iframe (solo-mode URLs) under /grafana/.... nginx fronts both the manager API and the Grafana embed on the same origin; the iframe is allowed via GF_SECURITY_ALLOW_EMBEDDING=true.
H
Health (plugin)
The supervisor-reported runtime state of an edge plugin: running, crashed, starting, stopping. Surfaced via GET /v1/edges/{id}/plugins and on the Edges page next to each plugin toggle.
I
IM bridge
The bounded context that connects Ongrid chat to external IM platforms (Slack, Telegram, Lark, DingTalk, WeCom). One row in im_apps per registered app. Incoming events on /v1/im/<provider>/events are converted into chat sessions.
Incident
A firing alert. One incident per (rule, scope) tuple, deduped via dedupe key. Has a lifecycle (open → acknowledged → resolved), an event timeline, and optionally an AI-generated investigation report.
J
join_mode
Rule field: all or any. Determines whether every entry in conditions[] must match for the rule to fire (default all) or any one (any).
K
Kind (rule kind)
The discriminator that drives which sub-evaluator runs over a rule's conditions. Phase-A: metric_raw, metric_anomaly, metric_forecast, metric_burn_rate. Phase-B: log_match, log_volume, trace_latency, trace_error_rate. Plus the UI-only input kind metric_threshold. See Alert rule schema.
L
Loki
Grafana's log store (ADR-012). Bundled in the compose stack as loki:3.4.0. Edges push via the data plane; the manager queries via /v1/logs/query_range.
M
Marketplace
The skill-pack distribution system (ADR-017). A pack is a directory of skills + agents + a manifest. Install via POST /v1/marketplace/install. Registries point at HTTPS-hosted indexes.
Mention
The @edge, @device, @dashboard, @incident autocomplete syntax in the chat box. The manager resolves them into structured context before sending the message to the LLM.
N
NotifyWindowSeconds / NotifyMinFires
Per-rule notification dampening pair. A rule firing fewer than NotifyMinFires times inside the trailing NotifyWindowSeconds writes a repeat_suppressed event but does not send a notification. See Alert rule schema.
O
OTLP
OpenTelemetry Protocol — the wire format edges use to ship traces (via otelcol-contrib) to the manager's /v1/traces endpoint.
otelcol-contrib
The OpenTelemetry Collector contrib distribution. Bundled in the release tarball as the traces plugin. Subprocess managed by the agent's plugin supervisor.
P
Persona
An agent's behavior definition — a markdown file with YAML frontmatter (name, description, when_to_use, tools, model, permission_mode, system prompt). See Agent persona format.
Pluggable embedding
The RAG pipeline supports three embedding providers: zhipu (default, GLM embedding API), openai, and local (on-disk bge model). Switch via ONGRID_EMBEDDING_PROVIDER.
Plugin (edge plugin)
A subprocess managed by the edge agent's supervisor: promtail (logs), node_exporter (host metrics), process_exporter (proc metrics), otelcol-contrib (traces). Configured via PUT /v1/edges/{id}/plugins/{name}.
promtail
Grafana's log shipper. Bundled in the release tarball as the logs plugin. Subprocess of the edge agent.
push_prom_samples
The tunnel-side metric-push RPC. Carries edge metric samples to the manager's cloud Prom. Today on the control plane; potentially moves to data plane — see migration triggers in Telemetry data plane.
Q
query_promql / query_traceql / search_logs
Three of the core observability tools the agent can call. They proxy through the manager to Prom / Tempo / Loki respectively, returning structured results the LLM can reason over.
R
RAG
Retrieval-Augmented Generation. Ongrid's knowledge base (vault + repos + uploaded docs) is indexed in Qdrant; queries to the agent automatically retrieve top-k chunks. See Capabilities → Knowledge base.
RCA
Root Cause Analysis. The agent's investigation pipeline that walks from an alert symptom through topology + metrics + logs + traces + source to a verifiable cause statement.
ReAct
The "Reason + Act" loop: the agent thinks (assembles a tool plan), acts (calls a tool), observes (reads the result), then loops. Bounded by max_turns on the persona.
Rule key
Stable lower_snake identifier for an alert rule. Used in dedupe keys and incident.rule. Unique across non-soft-deleted rows.
S
Scope type
Rule field: host, global, or monitoring_pipeline. Determines the evaluator's grouping dimension. host produces one incident per device_id; global produces one incident system-wide; monitoring_pipeline is for internal pipeline-health rules.
Severity
Alert / channel severity floor: info, warning, critical. A channel's match_severity_min=warning accepts warning + critical; critical accepts only critical.
Skill
A tool the agent can call. Either built-in (compiled into the binary) or external (subprocess with a skill.json manifest). Both live in the same registry; the LLM does not distinguish. See Skill manifest.
SOP (dual-sign)
The two-step confirmation flow for dangerous tool calls. Persona must have permission_mode: dual-sign-required; the runtime presents the planned call for review, then executes only after explicit confirmation.
Specialist
A non-coordinator agent persona — incident investigator, network, compute, disk, SRE. The coordinator picks one by matching the user's query against each persona's when_to_use field. See Specialists.
T
Tempo
Grafana's trace store (ADR-013). Bundled as tempo:2.5.0. Edges push via OTLP; manager queries via TraceQL.
Tenant
A logical isolation boundary (org + members). Single-tenant in the open-source MVP — tenant logic exists in the schema but evaluates to "everyone in the same tenant" until multi-tenant features ship.
Tool
The LLM-facing handle to a skill. A skill may declare multiple tools (via the SKILL.md tools: list). At runtime, each tool has a name, description, JSON Schema, class, and when_to_use hint.
ToolBag deferral
Optimization for large skill registries. When the tool count exceeds ONGRID_TOOLBAG_DEFERRAL_THRESHOLD (default 30), specialty-tier tools get redacted schemas in the prompt. The LLM must call ToolSearch to expand a redacted tool before using it. Saves prompt tokens.
Topology
The typed CMDB (nodes + relations) under internal/manager/server/topology. Nodes have a type (service, host, database, queue, ...) with a schema; relations have a type (depends_on, runs_on, served_by). The agent walks topology to compute blast radius.
Tunnel
The geminio TLS connection from edge to broker (frontier). See Control plane and geminio.
V
Vault
The built-in knowledge-base repository (github.com/ongridio/vault). Synced to the manager's RAG store on first boot. See Built-in vault.
W
WebSSH
Browser-based shell over the tunnel. The edge port-forwards bytes to the local sshd; the SSH client lives entirely in the manager. See Capabilities → WebShell.
when_to_use
A persona / skill / tool frontmatter field that gives the coordinator (or the LLM) a one-line "when should this be picked" decision hint. Distinct from description which is "what is it".
Z
Zhipu / GLM
Default LLM provider in Chinese-network deployments. Models include glm-4.7, glm-5, glm-5.1. Configured via ONGRID_ZHIPU_* env vars.