Skills (tools)
A skill is one self-contained capability the LLM can invoke. The skill framework auto-derives the LLM tool registration, the HTTP API, the UI form, the permission gate, and the audit log from a single metadata struct — adding a new skill means writing one file and calling Register in init().
Skills are L2 (device-direct) when they run on an edge agent, L3 (intelligence) when they run on the manager. Both are first-class.
Anatomy
Every skill ships:
// internal/skill/types.go:99
type Metadata struct {
Key string // lower_snake — id in dedupe keys, audit logs, LLM tool names
Name string // human label
Description string // shown to humans AND to the LLM
Class Class // safe | mutating | dangerous
Scope Scope // host (default) | manager
Category string // free-form group label
Params ParamSchema
ResultPreview string
}Plus an Executor:
type Executor interface {
Metadata() Metadata
Execute(ctx context.Context, params json.RawMessage) (json.RawMessage, error)
}The framework dispatches based on Scope:
ScopeHost— the manager wraps the call in a tunnel RPC (Caller.Call(ctx, edgeID, "execute_skill", body)), the edge agent's oneexecute_skillhandler dispatches by key. The LLM tool wrapper injects a requirededge_idinteger property in the schema.ScopeManager— runs in-process. Noedge_id; useful for public-internet calls (web_search), external APIs, subprocess skill packs.
Permission classes
Built into the metadata so the gate runs on every invocation, not as a bolt-on:
| Class | Examples | Who can invoke |
|---|---|---|
safe | probe_*, read_file, tail_file, query_promql | LLM, any role |
mutating | restart_service, kill_process | Requires human-in-the-loop approval (parked, but the gate exists) |
dangerous | rm, reboot, drop_table | Requires RSA-signed SOP + dual approval (parked) |
The default class is safe — but the framework logs a warning at registration time when an author forgets to set the field, so this can't slip silently. See types.go:205.
The skill_bridge.go in the aiops tools registry currently only exposes ClassSafe skills to the LLM — see skill_bridge.go. Mutating and dangerous classes wait for the PR-G4 approval workflow.
The three skill populations
Go builtins (edge)
Hand-written in internal/skill/builtin/:
| Key | What |
|---|---|
probe_http, probe_dns, probe_tcp | Read-only network reachability |
tail_file, read_journal | Log surface |
host_netns_inspect | Network namespace inventory |
web_search | Manager-side; SearXNG default, configurable |
restart_service | Mutating; gated |
Each is a Go file with init() calling skill.Register. The edge binary ships every builtin baked in — no plugin install, no remote code execution surface.
Subprocess skill packs
For capabilities you want to drop in without rebuilding the edge agent, ship a directory with a skill.json manifest and an executable. The loader in internal/skill/subprocess.go reads the manifest, registers the skill, dispatches Execute to the binary with stdin params.
Used for: network research tools (ovs-vsctl, nft, bpftool, ethtool, ip netns), Kubernetes inspect helpers — anything where the binary already exists and shelling out beats rewriting in Go.
Manager-side BaseTools
The largest population. Live under internal/manager/biz/aiops/tools/ as *_basetool.go files. Each carries its own hand-written JSON Schema (needed for shapes the declarative ParamSchema can't express: arrays, nested objects, oneOf).
Selected BaseTools:
| Tool | What it does |
|---|---|
bash | Shell on a target edge (gated; recorded) |
query_promql, query_logql, query_traceql | The three observability backends |
query_incidents, get_incident_detail, query_alert_rules | Alerting surface |
query_edges, query_change_events | Inventory + audit |
correlate_incident | Composite fan-out to prom + log + trace |
expand_topology, find_topology_node, get_topology | Graph |
query_knowledge | RAG |
find_outlier_edges, rank_edges | Multi-host comparison |
host_load, host_processes, host_files | Host-state batch tools |
get_edge_summary | One-shot edge health snapshot |
restart_service | Manager-side wrapper around the edge restart skill |
send_message | Coordinator → specialist agent comms |
task_stop, agent_tool | Worker lifecycle |
About 30+ in total. The full list is whatever's registered in BuildBaseTools in cmd/ongrid/main.go.
The inventory bridge
Two parallel registries existed before the bridge:
- The skill registry —
ScopeHostGo builtins + subprocess packs. Surfaced on the SPA's/skillspage with audit + class gate. - The BaseTool bag — hand-written manager-side tools. Surfaced to the LLM but NOT to
/skills.
Operators couldn't see what cloud-side capabilities the agent actually had without reading source. The inventory_bridge.go walks the BaseTool bag and registers every tool as a skill with Scope=ScopeManager. The opt-in RawSchemaProvider interface is used to preserve each BaseTool's hand-written JSON Schema verbatim.
Per the 2026-05-08 memo, 18 BaseTools are bridged through this path — the /skills page now lists every cloud-side capability with a scope chip indicating edge vs manager.
The reverse bridge is also wired: skill_bridge.go takes every safe skill and registers it as an LLM-facing Tool, so the LLM sees skills as function-calling tools with an edge_id parameter auto-prepended for ScopeHost ones.
Audit
Every tool invocation writes a row to chat_tool_calls:
session_id— the chatruntime session that called.tool_name,args_json,result_json,error.device_id— when the tool targeted a specific host (theEdgeIDfield onExecuteResult).started_at,finished_at,duration_ms.caller_user_id+caller_role— for LLM-originated calls, the framework usesUserID=0/Role="system".
The HLD-010 audit log piggybacks on the same table; the admin /admin/audit page renders it with timeline + per-tool filters.
See also
- WebShell —
bashskill expressed as an interactive terminal rather than a one-shot tool call. - Knowledge —
query_knowledgedeep dive. - Topology —
expand_topology/find_topology_node. - Skill manifest — wire format for subprocess skill packs.