Skip to content

Skills (tools)

A skill is one self-contained capability the LLM can invoke. The skill framework auto-derives the LLM tool registration, the HTTP API, the UI form, the permission gate, and the audit log from a single metadata struct — adding a new skill means writing one file and calling Register in init().

Skills are L2 (device-direct) when they run on an edge agent, L3 (intelligence) when they run on the manager. Both are first-class.

Anatomy

Every skill ships:

go
// internal/skill/types.go:99
type Metadata struct {
    Key          string   // lower_snake — id in dedupe keys, audit logs, LLM tool names
    Name         string   // human label
    Description  string   // shown to humans AND to the LLM
    Class        Class    // safe | mutating | dangerous
    Scope        Scope    // host (default) | manager
    Category     string   // free-form group label
    Params       ParamSchema
    ResultPreview string
}

Plus an Executor:

go
type Executor interface {
    Metadata() Metadata
    Execute(ctx context.Context, params json.RawMessage) (json.RawMessage, error)
}

The framework dispatches based on Scope:

  • ScopeHost — the manager wraps the call in a tunnel RPC (Caller.Call(ctx, edgeID, "execute_skill", body)), the edge agent's one execute_skill handler dispatches by key. The LLM tool wrapper injects a required edge_id integer property in the schema.
  • ScopeManager — runs in-process. No edge_id; useful for public-internet calls (web_search), external APIs, subprocess skill packs.

Permission classes

Built into the metadata so the gate runs on every invocation, not as a bolt-on:

ClassExamplesWho can invoke
safeprobe_*, read_file, tail_file, query_promqlLLM, any role
mutatingrestart_service, kill_processRequires human-in-the-loop approval (parked, but the gate exists)
dangerousrm, reboot, drop_tableRequires RSA-signed SOP + dual approval (parked)

The default class is safe — but the framework logs a warning at registration time when an author forgets to set the field, so this can't slip silently. See types.go:205.

The skill_bridge.go in the aiops tools registry currently only exposes ClassSafe skills to the LLM — see skill_bridge.go. Mutating and dangerous classes wait for the PR-G4 approval workflow.

The three skill populations

Go builtins (edge)

Hand-written in internal/skill/builtin/:

KeyWhat
probe_http, probe_dns, probe_tcpRead-only network reachability
tail_file, read_journalLog surface
host_netns_inspectNetwork namespace inventory
web_searchManager-side; SearXNG default, configurable
restart_serviceMutating; gated

Each is a Go file with init() calling skill.Register. The edge binary ships every builtin baked in — no plugin install, no remote code execution surface.

Subprocess skill packs

For capabilities you want to drop in without rebuilding the edge agent, ship a directory with a skill.json manifest and an executable. The loader in internal/skill/subprocess.go reads the manifest, registers the skill, dispatches Execute to the binary with stdin params.

Used for: network research tools (ovs-vsctl, nft, bpftool, ethtool, ip netns), Kubernetes inspect helpers — anything where the binary already exists and shelling out beats rewriting in Go.

Manager-side BaseTools

The largest population. Live under internal/manager/biz/aiops/tools/ as *_basetool.go files. Each carries its own hand-written JSON Schema (needed for shapes the declarative ParamSchema can't express: arrays, nested objects, oneOf).

Selected BaseTools:

ToolWhat it does
bashShell on a target edge (gated; recorded)
query_promql, query_logql, query_traceqlThe three observability backends
query_incidents, get_incident_detail, query_alert_rulesAlerting surface
query_edges, query_change_eventsInventory + audit
correlate_incidentComposite fan-out to prom + log + trace
expand_topology, find_topology_node, get_topologyGraph
query_knowledgeRAG
find_outlier_edges, rank_edgesMulti-host comparison
host_load, host_processes, host_filesHost-state batch tools
get_edge_summaryOne-shot edge health snapshot
restart_serviceManager-side wrapper around the edge restart skill
send_messageCoordinator → specialist agent comms
task_stop, agent_toolWorker lifecycle

About 30+ in total. The full list is whatever's registered in BuildBaseTools in cmd/ongrid/main.go.

The inventory bridge

Two parallel registries existed before the bridge:

  • The skill registry — ScopeHost Go builtins + subprocess packs. Surfaced on the SPA's /skills page with audit + class gate.
  • The BaseTool bag — hand-written manager-side tools. Surfaced to the LLM but NOT to /skills.

Operators couldn't see what cloud-side capabilities the agent actually had without reading source. The inventory_bridge.go walks the BaseTool bag and registers every tool as a skill with Scope=ScopeManager. The opt-in RawSchemaProvider interface is used to preserve each BaseTool's hand-written JSON Schema verbatim.

Per the 2026-05-08 memo, 18 BaseTools are bridged through this path — the /skills page now lists every cloud-side capability with a scope chip indicating edge vs manager.

The reverse bridge is also wired: skill_bridge.go takes every safe skill and registers it as an LLM-facing Tool, so the LLM sees skills as function-calling tools with an edge_id parameter auto-prepended for ScopeHost ones.

Audit

Every tool invocation writes a row to chat_tool_calls:

  • session_id — the chatruntime session that called.
  • tool_name, args_json, result_json, error.
  • device_id — when the tool targeted a specific host (the EdgeID field on ExecuteResult).
  • started_at, finished_at, duration_ms.
  • caller_user_id + caller_role — for LLM-originated calls, the framework uses UserID=0 / Role="system".

The HLD-010 audit log piggybacks on the same table; the admin /admin/audit page renders it with timeline + per-tool filters.

See also

  • WebShellbash skill expressed as an interactive terminal rather than a one-shot tool call.
  • Knowledgequery_knowledge deep dive.
  • Topologyexpand_topology / find_topology_node.
  • Skill manifest — wire format for subprocess skill packs.