Linux (server)
The manager is shipped as a Docker image (ongrid:<version>) and is operated through docker compose. Both come from the release tarball produced by make package — no apt install ongrid, no host-side daemon.
Supported distributions
We run the manager day-to-day on the following hosts. Anything that runs Docker Engine 24+ should work; the list below is what gets explicit testing per release.
| Distribution | Tested on |
|---|---|
| Ubuntu | 22.04 LTS, 24.04 LTS |
| Debian | 12 (bookworm) |
| RHEL | 9 |
| Rocky Linux | 9 |
| AlmaLinux | 9 |
CentOS 7 is not supported — its glibc and systemd are too old for the bundled Compose stack and have hit upstream EOL.
Required software
- Docker Engine 24.0 or later. Older releases lack
--platformplumbing thatmake docker-buildrelies on and breakextra_hosts: host-gateway. - Docker Compose v2 (the
docker composeCLI plugin, not the legacydocker-composebinary). The compose file atdeploy/docker-compose.ymldeliberately omits the top-levelversion:key, which v1 rejects. - systemd for the host. The manager itself does not register a systemd unit — the install script
deploy/install/server-install.shonly relies on systemd to start Docker — but auto-restart on reboot needsdocker.serviceenabled. - iptables / nftables is fine; both backends work.
No required kernel modules beyond what Docker itself needs (overlayfs, namespaces, cgroups v1 or v2).
Resource floors
| Resource | Floor | Comfortable | Why |
|---|---|---|---|
| CPU | 2 vCPU | 4 vCPU | The manager Go process, Prometheus, Loki, Tempo, Grafana, MySQL all live on the same host. |
| Memory | 4 GB | 8 GB | Prometheus + Loki together can hit 1.5 GB at idle. Headroom keeps the OOM killer off MySQL. |
| Disk | 20 GB | 100 GB | Prometheus retention is 90 days / 20 GB by default (see --storage.tsdb.retention.* in the compose file). Loki + Tempo share the rest. |
| Network | 10 Mbps | 100 Mbps | Telemetry is the data plane — log shipping from many edges can saturate a thin pipe long before the manager itself does. |
The 20 GB floor assumes you only test with a handful of edges. Once you cross ~25 hosts shipping metrics + logs + traces, plan on 200 GB just to keep 30 days of Loki around.
Ports
| Port | Direction | Protocol | What it is |
|---|---|---|---|
| 443 | inbound | HTTPS | nginx — SPA, REST API, log/trace ingest |
| 80 | inbound | HTTP | nginx redirect to 443 |
| 40012 | inbound | TCP (geminio) | Frontier broker — every edge dials this |
| 3306 | exposed locally | TCP | MySQL — bind to 127.0.0.1 in production |
| 9090 | exposed locally | HTTP | Prometheus — bind to 127.0.0.1 in production |
| 9100 | inbound (optional) | HTTP | Manager /metrics — for an external Prometheus to scrape |
Only 443, 80, and 40012 need to be reachable from the public internet (or your edge fleet's network). Everything else stays bound to localhost in production — nginx fronts the API, and Grafana / Prometheus / Loki / Tempo are reached over the docker network.
What the compose stack runs
From deploy/docker-compose.yml:
| Service | Image | Role |
|---|---|---|
mysql | mysql:8.0 | Manager's primary store (users, edges, devices, alert rules, sessions). |
ongrid | ongrid:<version> | Manager — Go binary built from cmd/ongrid. |
nginx | ongrid-web:<version> | TLS termination, SPA, /api/* reverse proxy, /edge/* static for the install bundle. |
frontier | singchia/frontier:1.2.5 | Geminio broker; edge tunnel endpoint (ADR-007). |
prometheus | prom/prometheus:v2.54.0 | Metric store + remote_write receiver (ADR-009). |
loki | grafana/loki:3.4.0 | Log store (ADR-012). |
tempo | grafana/tempo:2.5.0 | Trace store (ADR-013). |
searxng | searxng/searxng:latest | Self-hosted search backend for the web_search skill. |
grafana | grafana/grafana-oss:11.1.4 | Dashboard embed for the Monitor page. |
The compose file is the production shape — there is no second "prod" file. Override knobs via the env block (ONGRID_* — see Environment variables) instead of forking the YAML.
TLS
nginx terminates TLS using certificates bind-mounted from deploy/certs/ (cert.pem, key.pem). For first boot the install script generates a self-signed cert; replace it with a real cert at any time and docker compose restart nginx. Let's Encrypt is supported via a sidecar pattern documented in deploy/README.md.
Selinux + AppArmor
Both work out of the box. The compose file does not mount sensitive host paths read-write; the bind mounts (./edge, ./nginx, ./certs, ./grafana/provisioning) are all :ro. If your SELinux policy enforces strict labels on bind mounts, append :Z to the mount strings inside docker-compose.override.yml.
Upgrading
make package # produces dist/out/ongrid-vX.Y.Z-linux-amd64.tar.xz
scp dist/out/ongrid-vX.Y.Z-linux-amd64.tar.xz host:/tmp/
ssh host /tmp/upgrade.shupgrade.sh lives next to the tarball and does the idempotent dance: load the new image, write a temporary docker-compose.override.yml pinning the new tag, docker compose up -d, prune the previous image after one healthy minute. See Upgrade.
What does NOT work
- Podman + podman-compose: the compose file uses Docker-specific shorthand (
extra_hosts: host-gateway,:ZSELinux semantics) that podman either parses incompatibly or ignores. We do not test it. - K3s host-mode networking: works for the manager, but the edge installer is built around bare systemd — see Kubernetes for the daemonset pattern.
- OpenRC / runit / s6: the manager container does not care, but the host's Docker daemon needs to come up on boot. If your init system can do that, you are fine.