Skip to content

agent persona 格式

一个 agent persona 是一份 markdown 文件,描述一个 agent —— coordinator、 incident investigator、specialist —— 怎么行为:能调哪些 tool、跑什么模型、 能跑几轮 ReAct、带什么 system prompt。

真理之源:internal/manager/biz/aiops/chatruntime/types.go 里的 Agent

磁盘形状

markdown
---
name: incident_investigator
description: Walk an incident from symptom to root cause, calling host + observability tools.
when_to_use: When the user asks why an alert is firing, or shares an incident link.
tools:
  - expand_topology
  - find_topology_node
  - query_promql
  - search_logs
  - query_traceql
  - host_probe
  - bash
disallowed_tools: []
permission_mode: read-only
max_turns: 24
model: anthropic/claude-sonnet-4-6
critical_reminder: |
  Always show your evidence. Cite the PromQL / LogQL / file path. Never speculate
  beyond what the tools returned.
initial_prompt: |
  You are investigating incident {{ '{{' }}.incident_id{{ '}}' }} on device {{ '{{' }}.device_id{{ '}}' }}.
  Start by reading the incident summary.
background: false
omit_claude_md: false
metadata:
  os: [linux, darwin]
  requires:
    bins: []
    config: []
  ongrid:
    scope: manager
---

# Incident investigator

You are an SRE-grade incident investigator. Given an incident, your job is to:

1. Pull the alert detail and any attached evidence (alert summary, snapshot).
2. Expand the device's topology to understand the blast radius.
3. Query the relevant signal (metric / log / trace) to confirm the symptom.
4. Walk upstream services / underlying resources until you find the root cause.
5. Return an evidence-backed answer in plain language.

When the user asks a follow-up, stay grounded in tool output. If you cannot
verify a claim with a tool call, say so explicitly and stop.

frontmatter 是 YAML。正文(--- 之后)是 worker LLM 看到的 system prompt。 正文里的空白和 markdown 格式逐字保留。

frontmatter 字段

字段类型必填描述
namestring派发时用的 agent 标识符(/v1/agents/{name})。
descriptionstringagent picker 里显示的人类列表字符串。
when_to_usestringcoordinator 的派发决定 hint。coordinator 决定调哪个 specialist 时读这个。
toolsstring[]显式 tool 白名单。空 = 从 policy 继承(用户角色能见的所有 tool)。
disallowed_toolsstring[]在白名单后应用的黑名单。黑赢白。
permission_modeenum否(默认 read-onlyread-onlymutating-with-confirmdual-sign-required。卡哪些 tool class 能不确认直接跑。
max_turnsint限制 worker 的内部 ReAct 循环。零时用 coordinator 默认。
modelstringLLM 标识符(anthropic/claude-sonnet-4-6openai/gpt-5.4zhipu/glm-4.7 等)。空 = 继承 coordinator 默认。
critical_reminderstring每轮都注入的 system-reminder 块。抗漂移机制。
initial_promptstring派发时拼到第一条 user 消息前。支持基于派发上下文的 Go 模板语法({{.incident_id}}{{.device_id}})。
backgroundbooltrue 强制异步执行(长跑 worker)。
omit_claude_mdbool跳过继承全局 system 上下文。给 scope 紧的 reviewer agent 用。
metadataobjectOS 门、必需二进制 / config key、ongrid 扩展(scope、edge_runtime、edge_capabilities)。

未知 frontmatter key 会被保留(解析器存到 UnknownFields),所以未来从 openclaw / claude-code 加进来的字段不会破坏加载器。

source 字段

SPA 读回一个 agent 时,API 也包含一个在磁盘 frontmatter 里的 source 字段:

含义
builtin编译在二进制里(程序化 Add)。UI 里只读。
disk从二进制旁的 agents/*.md 或外部目录加载。UI 里只读。
user用户通过 POST /v1/agents/custom 创建。UI 里可编辑可删除。

permission_mode

permission_mode 字段卡 persona 能跑的tool class

模式允许的 class需要确认
read-onlyread(别名 safe永不
mutating-with-confirmread + write每次 write 调用一次
dual-sign-requiredread + write + destructivedestructive 走两步 SOP;write 一次

persona 还能通过 tools(白名单)和 disallowed_tools(黑名单)进一步 约束。运行时按这个顺序应用:

  1. 取用户角色能见的全局 tool 集。
  2. tools 取交集(非空时)。
  3. 移除 disallowed_tools 里的任何东西。
  4. 对剩下的每个 tool,按它的 class 检查 permission_mode

注册流程

text
Built-in personas
  ↳ programmatic Add() in cmd/ongrid/main.go at startup. Cannot be deleted.

On-disk personas
  ↳ ./agents/*.md (relative to manager working dir) scanned at boot.
  ↳ ONGRID_AGENTS_EXTERNAL_DIRS adds more.
  ↳ The loader walks every .md, parses frontmatter via skill_parser.go / agent_parser.go.
  ↳ Each agent is registered with Source="disk".
  ↳ Cannot be edited or deleted via the UI; remove the file and restart.

User personas
  ↳ POST /v1/agents/custom with the frontmatter as a JSON body.
  ↳ Stored in agents table (DB), not on disk.
  ↳ Source="user"; fully editable via PATCH /v1/agents/custom/{name}.

启动时合并顺序:builtin → disk → username 跟内置或磁盘 persona 同名 的 user persona 会遮蔽它。

派发 agent

coordinator 通过把用户 query 跟每个 persona 的 when_to_use 匹配来挑 specialist。程序化派发(chat API):

http
POST /api/v1/chat/sessions/{id}/messages
Content-Type: application/json
Authorization: Bearer ...

{
  "content": "Investigate incident 4217.",
  "agent": "incident_investigator",
  "context": { "incident_id": 4217, "device_id": 102 }
}

agent 省略时 coordinator 自己选。context 模板进 persona 的 initial_prompt

critical reminder

critical_reminder 块作为 system-reminder 消息在每轮顶部注入,不只 是第一轮。这是标准的 claude-code 抗漂移机制 —— 模型在对话中段漂了 (比如第 8 轮后不再引证据),reminder 把它拉回来。

省着用。每个 persona 一段短就够了。agent kernel 已经注入框架级 reminder (locale、模型名、可用 tool);你的 critical_reminder 只加 persona 专属 行为。

例子

最小 specialist

markdown
---
name: disk_specialist
description: Diagnose disk pressure issues — usage, IO, mount points.
when_to_use: When the user asks about disk-full, slow IO, or mount errors.
tools: [host_probe, bash, query_promql]
permission_mode: read-only
model: zhipu/glm-4.7
---

# Disk specialist

Focus exclusively on disk-related questions. ...

Reviewer(省略全局上下文)

markdown
---
name: change_reviewer
description: Review a proposed config change for blast radius.
when_to_use: When the user wants a second opinion on a destructive action.
tools: [expand_topology, read_repo, search_knowledge]
permission_mode: read-only
omit_claude_md: true
max_turns: 8
model: anthropic/claude-opus-4-7
critical_reminder: |
  Be a skeptic. Default to "do not proceed" unless evidence is overwhelming.
---

# Change reviewer

You are reviewing a proposed change. Your job is to find reasons the change
should NOT proceed. Approve only when you cannot find a reason to block.

另见