Skills (tools)

A skill is one self-contained capability the LLM can invoke. The skill framework auto-derives the LLM tool registration, the HTTP API, the UI form, the permission gate, and the audit log from a single metadata struct — adding a new skill means writing one file and calling Register in init().

Skills are L2 (device-direct) when they run on an edge agent, L3 (intelligence) when they run on the manager. Both are first-class.

Anatomy

Every skill ships:

// internal/skill/types.go:99
type Metadata struct {
    Key          string   // lower_snake — id in dedupe keys, audit logs, LLM tool names
    Name         string   // human label
    Description  string   // shown to humans AND to the LLM
    Class        Class    // safe | mutating | dangerous
    Scope        Scope    // host (default) | manager
    Category     string   // free-form group label
    Params       ParamSchema
    ResultPreview string
}

Plus an Executor:

type Executor interface {
    Metadata() Metadata
    Execute(ctx context.Context, params json.RawMessage) (json.RawMessage, error)
}

The framework dispatches based on Scope:

ScopeHost — the manager wraps the call in a tunnel RPC (Caller.Call(ctx, edgeID, "execute_skill", body)), the edge agent's one execute_skill handler dispatches by key. The LLM tool wrapper injects a required edge_id integer property in the schema.
ScopeManager — runs in-process. No edge_id; useful for public-internet calls (web_search), external APIs, subprocess skill packs.

Permission classes

Built into the metadata so the gate runs on every invocation, not as a bolt-on:

Class	Examples	Who can invoke
`safe`	probe_*, read_file, tail_file, query_promql	LLM, any role
`mutating`	restart_service, kill_process	Requires human-in-the-loop approval (parked, but the gate exists)
`dangerous`	rm, reboot, drop_table	Requires RSA-signed SOP + dual approval (parked)

The default class is safe — but the framework logs a warning at registration time when an author forgets to set the field, so this can't slip silently. See types.go:205.

The skill_bridge.go in the aiops tools registry currently only exposes ClassSafe skills to the LLM — see skill_bridge.go. Mutating and dangerous classes wait for the PR-G4 approval workflow.

The three skill populations

Go builtins (edge)

Hand-written in internal/skill/builtin/:

Key	What
`probe_http`, `probe_dns`, `probe_tcp`	Read-only network reachability
`tail_file`, `read_journal`	Log surface
`host_netns_inspect`	Network namespace inventory
`web_search`	Manager-side; SearXNG default, configurable
`restart_service`	Mutating; gated

Each is a Go file with init() calling skill.Register. The edge binary ships every builtin baked in — no plugin install, no remote code execution surface.

Subprocess skill packs

For capabilities you want to drop in without rebuilding the edge agent, ship a directory with a skill.json manifest and an executable. The loader in internal/skill/subprocess.go reads the manifest, registers the skill, dispatches Execute to the binary with stdin params.

Used for: network research tools (ovs-vsctl, nft, bpftool, ethtool, ip netns), Kubernetes inspect helpers — anything where the binary already exists and shelling out beats rewriting in Go.

Manager-side BaseTools

The largest population. Live under internal/manager/biz/aiops/tools/ as *_basetool.go files. Each carries its own hand-written JSON Schema (needed for shapes the declarative ParamSchema can't express: arrays, nested objects, oneOf).

Selected BaseTools:

Tool	What it does
`bash`	Shell on a target edge (gated; recorded)
`query_promql`, `query_logql`, `query_traceql`	The three observability backends
`query_incidents`, `get_incident_detail`, `query_alert_rules`	Alerting surface
`query_edges`, `query_change_events`	Inventory + audit
`correlate_incident`	Composite fan-out to prom + log + trace
`expand_topology`, `find_topology_node`, `get_topology`	Graph
`query_knowledge`	RAG
`find_outlier_edges`, `rank_edges`	Multi-host comparison
`host_load`, `host_processes`, `host_files`	Host-state batch tools
`get_edge_summary`	One-shot edge health snapshot
`restart_service`	Manager-side wrapper around the edge restart skill
`send_message`	Coordinator → specialist agent comms
`task_stop`, `agent_tool`	Worker lifecycle

About 30+ in total. The full list is whatever's registered in BuildBaseTools in cmd/ongrid/main.go.

The inventory bridge

Two parallel registries existed before the bridge:

The skill registry — ScopeHost Go builtins + subprocess packs. Surfaced on the SPA's /skills page with audit + class gate.
The BaseTool bag — hand-written manager-side tools. Surfaced to the LLM but NOT to /skills.

Operators couldn't see what cloud-side capabilities the agent actually had without reading source. The inventory_bridge.go walks the BaseTool bag and registers every tool as a skill with Scope=ScopeManager. The opt-in RawSchemaProvider interface is used to preserve each BaseTool's hand-written JSON Schema verbatim.

Per the 2026-05-08 memo, 18 BaseTools are bridged through this path — the /skills page now lists every cloud-side capability with a scope chip indicating edge vs manager.

The reverse bridge is also wired: skill_bridge.go takes every safe skill and registers it as an LLM-facing Tool, so the LLM sees skills as function-calling tools with an edge_id parameter auto-prepended for ScopeHost ones.

Audit

Every tool invocation writes a row to chat_tool_calls:

session_id — the chatruntime session that called.
tool_name, args_json, result_json, error.
device_id — when the tool targeted a specific host (the EdgeID field on ExecuteResult).
started_at, finished_at, duration_ms.
caller_user_id + caller_role — for LLM-originated calls, the framework uses UserID=0 / Role="system".

The HLD-010 audit log piggybacks on the same table; the admin /admin/audit page renders it with timeline + per-tool filters.

Skills (tools) ​

Anatomy ​

Permission classes ​

The three skill populations ​

Go builtins (edge) ​

Subprocess skill packs ​

Manager-side BaseTools ​

The inventory bridge ​

Audit ​

See also ​