MeshanicsDocs
Health probes & rollback

Defining health probes

A health probe is the contract that decides whether an update is good. The agent runs it on the device immediately after swapping in the new version. If the probe passes, the update is marked healthy; if it fails, the agent rolls back to the previous version automatically. Define the probe to answer one question: is the new artifact actually serving correctly on this device?

Probe types

There are three kinds of probe. You attach one to an artifact when you roll it out.

TypeWhat it checksHealthy when
HTTPA GET against a URL your workload exposes (typically a local readiness endpoint).The response status is 2xx.
FileA file path on the device — for example, a readiness file your app writes once it is up.The file exists and is non-empty.
ExecAn allowlisted health-check program on the device.The program exits 0.

The file probe pairs well with applications that already write a readiness or heartbeat file. The HTTP probe suits services that expose a health endpoint. The exec probe is the most flexible and the most tightly controlled — see below.

Timing and retries

Each probe carries a few timing controls so you can match it to how long your workload takes to come up:

  • Initial delay — wait this long after the swap before probing the first time, giving the workload room to start.
  • Timeout — how long a single attempt may take before it counts as a failure.
  • Attempts — how many times to try before declaring the update unhealthy (always at least one). Attempts are spaced out, so a slow start can still recover.

Rollback only triggers once every attempt has been used.

Exec probes are allowlisted, never a shell

The exec probe is deliberately constrained. The platform is the supply chain for these devices, so an exec probe must never become a way to run arbitrary code as root.

  • The target must be an absolute path to a binary that the device operator has put on that device's allowlist. If the path is not absolute or not on the allowlist, the probe is refused.
  • The binary is executed directly, with no shell — there is no sh -c, no argument string, and no interpolation. A check that needs arguments is wrapped in an allowlisted script on the device.
  • By default the allowlist is empty, so exec probes are refused until an operator explicitly permits specific binaries on that device.
  • The active artifact's path is passed to the binary through an environment variable, so a single health-check program can inspect whatever version is currently live.

This means the probe definition that travels with a rollout cannot, by itself, cause a device to run anything new: it can only invoke a binary the device's own operator already trusts.

Choosing a probe

A good probe fails fast and unambiguously on a bad version, and only reports healthy once the workload is genuinely serving. Prefer checks that exercise the real path — a readiness endpoint or file the app controls — over checks that merely confirm a process started. The stricter the probe, the safer the automatic rollback that backs it.