Mesh Troubleshooting
This page collects the failure modes we have observed during mesh and pairing development. Each entry has a symptom, the root cause, and the recovery path.Mesh services do not start after role transition
Symptom:ados gs role show reports the configured role correctly, but ados gs mesh health returns 404 or shows up: false. systemctl status ados-batman shows inactive (dead).
Likely cause: the second USB WiFi dongle is not detected, or mesh_capable is false in /etc/ados/profile.conf.
Recovery:
dmesg for driver bind messages and confirm the dongle’s chipset has Linux mesh-mode (802.11s or IBSS) driver support.
PUT /role returns 409 E_NOT_PAIRED
Symptom: Trying to transition a node torelay returns:
mesh_id or psk.key on disk. The agent guards relay mode behind a successful pair so mesh_manager does not enter a restart loop.
Recovery:
- Bring the receiver up first if it is not already.
- On the receiver: open the Accept window (
ados gs mesh accept --window 60). - On this node (still in direct mode): use the OLED
Mesh -> Join meshflow OR callPOST /api/v1/ground-station/pair/joinfrom the local CLI. - After the pair completes, retry the role transition.
PUT /role returns 409 E_MESH_NOT_CAPABLE
Symptom: any non-direct role target returns409 E_MESH_NOT_CAPABLE.
Cause: mesh_capable is false in /etc/ados/profile.conf. The second USB WiFi adapter was not present when profile_detect ran last.
Recovery: plug in the second USB WiFi adapter and rerun profile_detect by restarting the bootstrap service (sudo systemctl restart ados-bootstrap), or manually flip the flag if the dongle is already up:
OLED Accept window shows no pending relay
Symptom: receiver’s OLED Accept window is open and counting down, the relay’s OLED says it sent the request, but the receiver shows no pending entries. Likely cause one: the join request never reached the receiver because the mesh interface is not actually connected./etc/ados/mesh/id matches on both nodes (relays only have it after a pair, so on a first-time pair the relay is using the receiver’s mesh_id from the invite - which is exactly what you are trying to receive). The node sending the request has not yet received the bundle, so it does not know the mesh_id and is broadcasting on the local subnet instead.
Likely cause two: the relay sent the request before the Accept window was open. Open the window first, then send the request.
Likely cause three: the receiver bound the UDP listener but it failed silently. Check:
pairing_bind_failed log entry means port 5801 is in use. Investigate with lsof -i :5801.
Relay says “received invite” but stays in direct mode
Symptom: the relay’s OLED briefly shows the Joined Status screen but then reverts to direct mode, orados gs role show still says direct.
Cause: decrypting the invite and writing files to disk does not automatically transition the role. The relay operator must explicitly switch the role to relay via the OLED Mesh menu, the CLI (ados gs role set relay), or the REST API.
Recovery: transition the role. The pair files are already on disk so the next role transition succeeds.
Receiver picks the wrong cloud gateway
Symptom: the receiver routes cloud traffic over a slow gateway when a faster one is available. Cause: batman-adv ranks gateways by Transmit Quality (TQ) over the mesh, not by uplink speed. A nearby node with a 3G modem and high TQ can outrank a distant node with a 4G LTE modem if the LTE node’s mesh signal is poor. Recovery: pin the gateway you want.mode: auto.
Receiver shows partition warning
Symptom: GCS Mesh tab shows the partition indicator. Some neighbors disappear but the local mesh interface is up. Cause: the mesh has split. Some nodes can no longer reach the others. Common reasons: a node running on battery died, RF interference dropped a link past the recovery threshold, or someone walked between nodes while carrying the mesh dongle. Recovery: wait. batman-adv re-merges partitions automatically as soon as a path is restored. Thepartition_healed event fires on the GCS when this happens. If the partition lasts longer than the operator expected, walk between nodes and check antenna orientation.
Stale state files after a role transition
Symptom: the GCS Distributed RX page shows a relay or stream that is no longer on the mesh. The OLED status header is correct but the GCS lags. Cause:/run/ados/mesh-state.json, /run/ados/wfb-relay.json, or /run/ados/wfb-receiver.json have stale snapshots. role_manager wipes these on every transition, but if the transition was interrupted they might persist.
Recovery: force a clean transition:
Pairing UDP listener bind fails
Symptom:POST /api/v1/ground-station/pair/accept returns 503 with:
bat0 is not up.
Recovery:
Where to next
- Field Tap-to-Pair - the protocol.
- Local Mesh (batman-adv) - the carrier.
- Troubleshooting - for non-mesh issues.