Agent persona format
An agent persona is a markdown file describing how one agent — a coordinator, an incident investigator, a specialist — behaves: which tools it can call, which model it runs on, how many ReAct turns it gets, and what system prompt it carries.
Source of truth: Agent in internal/manager/biz/aiops/chatruntime/types.go.
On-disk shape
---
name: incident_investigator
description: Walk an incident from symptom to root cause, calling host + observability tools.
when_to_use: When the user asks why an alert is firing, or shares an incident link.
tools:
- expand_topology
- find_topology_node
- query_promql
- search_logs
- query_traceql
- host_probe
- bash
disallowed_tools: []
permission_mode: read-only
max_turns: 24
model: anthropic/claude-sonnet-4-6
critical_reminder: |
Always show your evidence. Cite the PromQL / LogQL / file path. Never speculate
beyond what the tools returned.
initial_prompt: |
You are investigating incident {{ '{{' }}.incident_id{{ '}}' }} on device {{ '{{' }}.device_id{{ '}}' }}.
Start by reading the incident summary.
background: false
omit_claude_md: false
metadata:
os: [linux, darwin]
requires:
bins: []
config: []
ongrid:
scope: manager
---
# Incident investigator
You are an SRE-grade incident investigator. Given an incident, your job is to:
1. Pull the alert detail and any attached evidence (alert summary, snapshot).
2. Expand the device's topology to understand the blast radius.
3. Query the relevant signal (metric / log / trace) to confirm the symptom.
4. Walk upstream services / underlying resources until you find the root cause.
5. Return an evidence-backed answer in plain language.
When the user asks a follow-up, stay grounded in tool output. If you cannot
verify a claim with a tool call, say so explicitly and stop.The frontmatter is YAML. The body (after ---) is the system prompt the worker LLM sees. Whitespace and markdown formatting in the body are preserved verbatim.
Frontmatter fields
| Field | Type | Required | Description |
|---|---|---|---|
name | string | yes | Agent identifier used at spawn time (/v1/agents/{name}). |
description | string | yes | Human listing string shown in the agent picker. |
when_to_use | string | yes | Coordinator's spawn-decision hint. The coordinator reads this when deciding which specialist to invoke. |
tools | string[] | no | Explicit tool whitelist. Empty = inherit from policy (every tool the user's role can see). |
disallowed_tools | string[] | no | Blacklist applied after the whitelist. Black wins over white. |
permission_mode | enum | no (default read-only) | read-only, mutating-with-confirm, dual-sign-required. Gates which tool classes can run without confirmation. |
max_turns | int | no | Caps the worker's internal ReAct loop. Coordinator default applies if zero. |
model | string | no | LLM identifier (anthropic/claude-sonnet-4-6, openai/gpt-5.4, zhipu/glm-4.7, etc.). Empty = inherit coordinator default. |
critical_reminder | string | no | System-reminder block injected on every turn. Anti-drift mechanism. |
initial_prompt | string | no | Prepended to the first user message at spawn. Supports Go template syntax over the spawn context ({{.incident_id}}, {{.device_id}}). |
background | bool | no | true forces async execution (long-running workers). |
omit_claude_md | bool | no | Skip inheriting the global system context. Used for tightly-scoped reviewer agents. |
metadata | object | no | OS gate, required binaries / config keys, ongrid extensions (scope, edge_runtime, edge_capabilities). |
Unknown frontmatter keys are preserved (the parser stores them under UnknownFields) so future fields from openclaw / claude-code do not break the loader.
Source field
When the SPA reads back an agent, the API also includes a source field that is not part of the on-disk frontmatter:
| Value | Meaning |
|---|---|
builtin | shipped in the binary (programmatic Add). Read-only in the UI. |
disk | loaded from agents/*.md next to the binary or under an external dir. Read-only in the UI. |
user | created by the user via POST /v1/agents/custom. Editable and deletable from the UI. |
Permission modes
The permission_mode field gates which tool classes the persona can run.
| Mode | Allowed classes | Confirmation required |
|---|---|---|
read-only | read (alias safe) | never |
mutating-with-confirm | read + write | once per write call |
dual-sign-required | read + write + destructive | two-step SOP for destructive; once for write |
A persona can further constrain via tools (whitelist) and disallowed_tools (blacklist). The runtime applies them in this order:
- Take the global tool set the user's role can see.
- Intersect with
toolsif non-empty. - Remove anything in
disallowed_tools. - For each remaining tool, check
permission_modeagainst itsclass.
Registration flow
Built-in personas
↳ programmatic Add() in cmd/ongrid/main.go at startup. Cannot be deleted.
On-disk personas
↳ ./agents/*.md (relative to manager working dir) scanned at boot.
↳ ONGRID_AGENTS_EXTERNAL_DIRS adds more.
↳ The loader walks every .md, parses frontmatter via skill_parser.go / agent_parser.go.
↳ Each agent is registered with Source="disk".
↳ Cannot be edited or deleted via the UI; remove the file and restart.
User personas
↳ POST /v1/agents/custom with the frontmatter as a JSON body.
↳ Stored in agents table (DB), not on disk.
↳ Source="user"; fully editable via PATCH /v1/agents/custom/{name}.The merge order at startup is builtin → disk → user. A user persona with the same name as a built-in or disk persona shadows it.
Spawning an agent
The coordinator picks a specialist by matching the user's query against every persona's when_to_use. To spawn programmatically (chat API):
POST /api/v1/chat/sessions/{id}/messages
Content-Type: application/json
Authorization: Bearer ...
{
"content": "Investigate incident 4217.",
"agent": "incident_investigator",
"context": { "incident_id": 4217, "device_id": 102 }
}If agent is omitted, the coordinator chooses. context is templated into the persona's initial_prompt.
Critical reminders
The critical_reminder block is injected as a system-reminder message at the top of every turn, not just the first. This is the standard claude-code anti-drift mechanism — when the model wanders mid-conversation (e.g. stops citing evidence after turn 8), the reminder pulls it back.
Use it sparingly. One short paragraph per persona is plenty. The agent kernel already injects framework-level reminders (locale, model name, available tools); your critical_reminder should add only persona-specific behavior.
Examples
Minimal specialist
---
name: disk_specialist
description: Diagnose disk pressure issues — usage, IO, mount points.
when_to_use: When the user asks about disk-full, slow IO, or mount errors.
tools: [host_probe, bash, query_promql]
permission_mode: read-only
model: zhipu/glm-4.7
---
# Disk specialist
Focus exclusively on disk-related questions. ...Reviewer (omits global context)
---
name: change_reviewer
description: Review a proposed config change for blast radius.
when_to_use: When the user wants a second opinion on a destructive action.
tools: [expand_topology, read_repo, search_knowledge]
permission_mode: read-only
omit_claude_md: true
max_turns: 8
model: anthropic/claude-opus-4-7
critical_reminder: |
Be a skeptic. Default to "do not proceed" unless evidence is overwhelming.
---
# Change reviewer
You are reviewing a proposed change. Your job is to find reasons the change
should NOT proceed. Approve only when you cannot find a reason to block.See also
- Agents → Overview — operator-facing tour of the built-in personas.
- Capabilities → Skills — the tool catalogue a persona picks from.
- Skill manifest — defining new tools a persona can call.
- REST API → aiops — endpoints for managing personas at runtime.