WebShell
WebShell is a browser-facing terminal that reaches every registered edge through the same geminio tunnel the rest of the platform uses. There is no separate SSH bastion, no jumpbox, no inbound port. The edge keeps dialing out; the manager opens a multiplexed stream class for shell I/O.
Use cases:
- The agent suggests a fix; you click "Open shell on edge-prod-04" and confirm the change without leaving the SPA.
- Vendor / contractor needs a one-off look at one host without VPN enrolment.
- Incident-response: every command is recorded with the audit row of the originating session.
Architecture
browser ──WebSocket──> manager:/v1/webshell/ws
│
├─ Router.Register(sessionID, sink, ActiveSession)
│
└─ geminio Stream (shell class)
│
▼
edge agent
└─ pty.Start("/bin/bash")The manager-side router is in internal/manager/biz/webshell/router.go. It maintains a sessionID → Sink directory: WebSocket handlers register on connect, the tunnel-incoming dispatcher routes the edge's output / exit pushes to the right browser.
// internal/manager/biz/webshell/router.go:57
type Router struct {
mu sync.RWMutex
sinks map[string]Sink
meta map[string]*ActiveSession // sessionID → metadata
stdoutBytes sync.Map // sessionID → *uint64
}The HTTP / WebSocket handler lives next door in internal/manager/server/webshell so the router stays HTTP-agnostic and unit-testable.
The two stream classes
The geminio tunnel multiplexes:
- Control class — JSON RPCs (skill execution, plugin signalling, alert evaluator probes).
- Shell class — raw byte streams (one per WebShell session, one per
tail -ffollower, etc.).
Splitting at the tunnel level matters because shell I/O is bursty and unframed; mixing it with the control RPCs starves the latter. Each class has its own backpressure budget.
Session metadata
Each live session has an ActiveSession:
// router.go:37
type ActiveSession struct {
SessionID string
OngridUserID uint64
SSHUser string
DeviceID uint64
EdgeID uint64
StartedAt time.Time
LastInputAt time.Time // updated on every browser → edge frame
}LastInputAt ticks on every keystroke (Router.TouchInput). An idle-timeout watchdog evicts sessions older than the configured limit without recent input — defends against the "I closed the browser tab with a running command" leak.
Audit recording
Two layers:
- Header row —
webshell_sessionstable: who, when, which edge, exit code, total bytes in/out. - Stream recording — the manager-side
Recorderinterface takes every byte that crosses the wire (both directions, timestamped) and appends to an asciinema-compatible cast file under/var/lib/ongrid/webshell-recordings/<session_id>.cast. The admin/admin/webshellpage plays them back.
The Recorder interface is narrow on purpose — production uses a file sink; tests inject a fake; future cloud-blob backends drop in without touching the rest of the stack.
Concurrency limits
Per-user cap: Router.CountByUser is called from the WebSocket open handler; over-cap connections are rejected with HTTP 429. Default cap is 5 (configurable). Per-edge cap defends against a runaway agent opening 100 concurrent shells.
Killing sessions
Three paths kill a session:
- Browser close — WebSocket disconnect propagates to the edge, which
kill -HUPs the pty. - Admin kill — the admin SPA calls
Killer.Kill(reason="admin terminated")on theSink, which tunnels a close down to the edge. The reason is recorded in the session's exit row. - Idle eviction — the watchdog fires
Kill("idle timeout")on sessions whoseLastInputAtexceeded the cap.
// router.go:50
type Killer interface {
Kill(reason string)
}The manager-side handler installs a Killer when it registers the Sink. Any Sink that opts in becomes admin-killable; the rest are only browser-close-killable.
Role gating
WebShell is gated on the admin role (ADR-022 RBAC). user role can chat with the agent but cannot open shells; viewer can read recordings of past sessions but cannot open new ones. The gate runs at the HTTP handler entry, before the WebSocket upgrade.
See also
- Skills — the
bashskill is the one-shot equivalent of WebShell (single command, no pty). Same audit substrate. - Edge install — getting a host's edge agent up so WebShell can reach it.
- Architecture — where the geminio tunnel sits.