MeshanicsDocs
AI/ML model OTA

Experiments & promotion

Promoting a model should never be a leap of faith. The platform's job is to let you expose a candidate to a controlled slice of the fleet, watch how it behaves, and widen only when the evidence supports it — with rollback always one step away.

What you can do today

You can run a model as a canary before it reaches production:

  • Keep candidate and production versions side by side using release channels — for example a canary channel for the model under evaluation and stable for what the fleet runs.
  • Roll the candidate to a small cohort using a wave strategy that starts at a low percentage. The first wave is your canary group.
  • Gate the next wave behind a manual approval, so widening is a deliberate decision rather than a timer.
  • Lean on the halt rule: if the candidate causes device failures beyond your threshold, the rollout pauses itself before it spreads.

This gives you the core promotion loop: candidate to a canary cohort, observe, then promote by widening the remaining waves to 100%.

How devices report

Each device reports the state of its assignment — verifying, applying, healthy, rolled back, or failed — and these stream to the dashboard in real time. You watch the canary verify and swap, see whether its health probe passes, and catch a regression on a handful of devices instead of the whole fleet.

Promoting

When the canary looks good, you promote by advancing the remaining waves until the model reaches the full target group. Every step is recorded in the append-only audit trail, so "we promoted version X to fleet Y on date Z, after the canary stayed healthy" is evidence, not recollection.

If the candidate looks worse, you abort instead of promoting. You choose whether to keep the candidate on devices that already have it or revert them to the previous version, and unreachable devices never block the decision.

Metric-driven experiments

A dedicated experiment workflow — explicit A/B assignment across a cohort, ingestion of application-reported metrics such as latency and confidence distribution, a side-by-side comparison, and metric-driven promotion — is a planned extension, holding to the same metadata-only line. Today the canary-and-channels approach above is how you compare and promote.