Skip to content

Glossary

Project-specific terms used across the docs and source. If a term has a longer treatment, the dedicated page is linked.

A

Agent

In Ongrid, an agent is a configured LLM worker with a specific job (coordinator, incident investigator, network specialist, etc.). Each agent has a persona describing which model it runs, which tools it can call, and how many ReAct turns it gets. Distinct from "edge agent" — see Edge.

Agent kernel

The runtime that drives an agent's ReAct loop: prompt assembly, tool registry resolution, model invocation, tool execution, response parsing. Two kernels exist: graph (default; built on eino) and legacy (for-loop). Toggle via ONGRID_AGENT_KERNEL.

Air-gapped

A deployment with no internet egress. Ongrid runs fully offline given a local LLM relay (e.g. vLLM, Ollama) and the local embedding model bundled in the release tarball. See Air-gapped / on-prem.

B

Blast radius

The set of services or hosts affected by a change or incident. The agent computes this by walking the topology graph (downstream nodes) before recommending any destructive action.

Bounded context (BC)

A subdomain of the manager with its own model, biz, data, and server packages. Examples: iam, edge, device, alert, aiops. Imports across BCs are restricted to interface ports; go-arch-lint enforces the boundary.

Built-in vault

The default knowledge-base content bundled at github.com/ongridio/vault. The manager syncs it on first boot and re-syncs on demand. Public repo; about 96 markdown playbooks.

C

Channel

A delivery target for notifications. Types: webhook, slack, feishu, dingtalk, wecom, telegram. Each channel can be filtered by severity and scope. See Channels.

Class (tool class)

Blast-radius classification of a tool: safe (read-only), mutating (reversible write), dangerous (irreversible). The persona's permission_mode field gates which classes are callable. See Skill manifest.

Cmdpolicy

The edge-side sandbox that gates bash skill invocations. Defines a binary whitelist, argument matchers, path allowlists, and network allowlists. Located under internal/edgeagent/cmdpolicy. Read-only by default — flipping to mutating requires explicit policy edits.

Control plane

The geminio tunnel. Carries edge lifecycle, RPC, heartbeats, alert events, and (today) metric push. See Data plane.

Coordinator

The top-level agent that decomposes user questions, dispatches specialist sub-agents, and assembles the final answer. Persona name: coordinator. See Coordinator.

D

Data plane

The independent outbound HTTPS path edges use to ship logs and traces directly to the manager's public ingest endpoints — distinct from the tunnel (control plane). See Telemetry data plane.

Dedupe key

Per-rule, per-scope key that the alert evaluator uses to collapse repeated firings into one open incident. Built from rule_key + scope_type + scope identifier + (optional rule-specific dimensions). Unique index on alert_incidents.dedupe_key.

Device

A logical host. Distinct from edge, which is the tunnel-connected agent process. One device may have zero or many edges (if reinstalled). The device_id is the canonical join key in PromQL labels and topology nodes.

E

Edge

The tunnel-connected agent process — ongrid-edge. Identified by edge_id. One edge per running agent process; one device may host multiple edges over its lifetime (after reinstalls). Display rule: the Edges page lists agents; the Devices page lists hosts.

Edge bundle

The tarball the manager ships to an edge for whole-bundle upgrade (ADR-024). Contains the agent binary plus every plugin binary, all for the target arch. Staged in /var/lib/ongrid-edge/.upgrade/ and swapped on next boot.

eino

The graph kernel library Ongrid uses for the agent's ReAct loop (ONGRID_AGENT_KERNEL=graph). Provides the prompt-assembly, tool-call, and graph-execution primitives.

F

Frontier

The upstream geminio broker (github.com/singchia/frontier, ADR-007). Edge dials port 40012; manager dials port 40011 over the docker network. Ships as a docker image bundled in the release tarball.

G

geminio

The TLS-based multi-stream tunnel protocol (github.com/singchia/geminio) Ongrid's frontier broker implements. Supports request/response RPC + raw streams over one persistent TLS connection.

Grafana embed

The Monitor page renders Grafana panels via iframe (solo-mode URLs) under /grafana/.... nginx fronts both the manager API and the Grafana embed on the same origin; the iframe is allowed via GF_SECURITY_ALLOW_EMBEDDING=true.

H

Health (plugin)

The supervisor-reported runtime state of an edge plugin: running, crashed, starting, stopping. Surfaced via GET /v1/edges/{id}/plugins and on the Edges page next to each plugin toggle.

I

IM bridge

The bounded context that connects Ongrid chat to external IM platforms (Slack, Telegram, Lark, DingTalk, WeCom). One row in im_apps per registered app. Incoming events on /v1/im/<provider>/events are converted into chat sessions.

Incident

A firing alert. One incident per (rule, scope) tuple, deduped via dedupe key. Has a lifecycle (open → acknowledged → resolved), an event timeline, and optionally an AI-generated investigation report.

J

join_mode

Rule field: all or any. Determines whether every entry in conditions[] must match for the rule to fire (default all) or any one (any).

K

Kind (rule kind)

The discriminator that drives which sub-evaluator runs over a rule's conditions. Phase-A: metric_raw, metric_anomaly, metric_forecast, metric_burn_rate. Phase-B: log_match, log_volume, trace_latency, trace_error_rate. Plus the UI-only input kind metric_threshold. See Alert rule schema.

L

Loki

Grafana's log store (ADR-012). Bundled in the compose stack as loki:3.4.0. Edges push via the data plane; the manager queries via /v1/logs/query_range.

M

Marketplace

The skill-pack distribution system (ADR-017). A pack is a directory of skills + agents + a manifest. Install via POST /v1/marketplace/install. Registries point at HTTPS-hosted indexes.

Mention

The @edge, @device, @dashboard, @incident autocomplete syntax in the chat box. The manager resolves them into structured context before sending the message to the LLM.

N

NotifyWindowSeconds / NotifyMinFires

Per-rule notification dampening pair. A rule firing fewer than NotifyMinFires times inside the trailing NotifyWindowSeconds writes a repeat_suppressed event but does not send a notification. See Alert rule schema.

O

OTLP

OpenTelemetry Protocol — the wire format edges use to ship traces (via otelcol-contrib) to the manager's /v1/traces endpoint.

otelcol-contrib

The OpenTelemetry Collector contrib distribution. Bundled in the release tarball as the traces plugin. Subprocess managed by the agent's plugin supervisor.

P

Persona

An agent's behavior definition — a markdown file with YAML frontmatter (name, description, when_to_use, tools, model, permission_mode, system prompt). See Agent persona format.

Pluggable embedding

The RAG pipeline supports three embedding providers: zhipu (default, GLM embedding API), openai, and local (on-disk bge model). Switch via ONGRID_EMBEDDING_PROVIDER.

Plugin (edge plugin)

A subprocess managed by the edge agent's supervisor: promtail (logs), node_exporter (host metrics), process_exporter (proc metrics), otelcol-contrib (traces). Configured via PUT /v1/edges/{id}/plugins/{name}.

promtail

Grafana's log shipper. Bundled in the release tarball as the logs plugin. Subprocess of the edge agent.

push_prom_samples

The tunnel-side metric-push RPC. Carries edge metric samples to the manager's cloud Prom. Today on the control plane; potentially moves to data plane — see migration triggers in Telemetry data plane.

Q

query_promql / query_traceql / search_logs

Three of the core observability tools the agent can call. They proxy through the manager to Prom / Tempo / Loki respectively, returning structured results the LLM can reason over.

R

RAG

Retrieval-Augmented Generation. Ongrid's knowledge base (vault + repos + uploaded docs) is indexed in Qdrant; queries to the agent automatically retrieve top-k chunks. See Capabilities → Knowledge base.

RCA

Root Cause Analysis. The agent's investigation pipeline that walks from an alert symptom through topology + metrics + logs + traces + source to a verifiable cause statement.

ReAct

The "Reason + Act" loop: the agent thinks (assembles a tool plan), acts (calls a tool), observes (reads the result), then loops. Bounded by max_turns on the persona.

Rule key

Stable lower_snake identifier for an alert rule. Used in dedupe keys and incident.rule. Unique across non-soft-deleted rows.

S

Scope type

Rule field: host, global, or monitoring_pipeline. Determines the evaluator's grouping dimension. host produces one incident per device_id; global produces one incident system-wide; monitoring_pipeline is for internal pipeline-health rules.

Severity

Alert / channel severity floor: info, warning, critical. A channel's match_severity_min=warning accepts warning + critical; critical accepts only critical.

Skill

A tool the agent can call. Either built-in (compiled into the binary) or external (subprocess with a skill.json manifest). Both live in the same registry; the LLM does not distinguish. See Skill manifest.

SOP (dual-sign)

The two-step confirmation flow for dangerous tool calls. Persona must have permission_mode: dual-sign-required; the runtime presents the planned call for review, then executes only after explicit confirmation.

Specialist

A non-coordinator agent persona — incident investigator, network, compute, disk, SRE. The coordinator picks one by matching the user's query against each persona's when_to_use field. See Specialists.

T

Tempo

Grafana's trace store (ADR-013). Bundled as tempo:2.5.0. Edges push via OTLP; manager queries via TraceQL.

Tenant

A logical isolation boundary (org + members). Single-tenant in the open-source MVP — tenant logic exists in the schema but evaluates to "everyone in the same tenant" until multi-tenant features ship.

Tool

The LLM-facing handle to a skill. A skill may declare multiple tools (via the SKILL.md tools: list). At runtime, each tool has a name, description, JSON Schema, class, and when_to_use hint.

ToolBag deferral

Optimization for large skill registries. When the tool count exceeds ONGRID_TOOLBAG_DEFERRAL_THRESHOLD (default 30), specialty-tier tools get redacted schemas in the prompt. The LLM must call ToolSearch to expand a redacted tool before using it. Saves prompt tokens.

Topology

The typed CMDB (nodes + relations) under internal/manager/server/topology. Nodes have a type (service, host, database, queue, ...) with a schema; relations have a type (depends_on, runs_on, served_by). The agent walks topology to compute blast radius.

Tunnel

The geminio TLS connection from edge to broker (frontier). See Control plane and geminio.

V

Vault

The built-in knowledge-base repository (github.com/ongridio/vault). Synced to the manager's RAG store on first boot. See Built-in vault.

W

WebSSH

Browser-based shell over the tunnel. The edge port-forwards bytes to the local sshd; the SSH client lives entirely in the manager. See Capabilities → WebShell.

when_to_use

A persona / skill / tool frontmatter field that gives the coordinator (or the LLM) a one-line "when should this be picked" decision hint. Distinct from description which is "what is it".

Z

Zhipu / GLM

Default LLM provider in Chinese-network deployments. Models include glm-4.7, glm-5, glm-5.1. Configured via ONGRID_ZHIPU_* env vars.