The agent ships a vision engine: a host service that owns the camera(s), captures and normalizes frames, and publishes them on a shared frame bus that plugins read. A vision plugin does not open the camera itself. It subscribes to the normalized frame stream, runs (or asks the engine to run) inference, and publishes detections that overlays and mission logic pick up. This is a host-owned bus, not a per-plugin capture loop. The engine arbitrates the camera and the accelerator across every vision consumer, so two plugins can watch the same camera without fighting over the device or the NPU. The operator manages vision models and watches detections in the per-drone Vision tab under the Command area of Mission Control.Documentation Index
Fetch the complete documentation index at: https://docs.altnautica.com/llms.txt
Use this file to discover all available pages before exploring further.
The frame contract
The engine downscales and converts each camera’s native format to a normalized frame, writes the pixels into a shared-memory ring (one ring per camera), and publishes a small frame descriptor on thevision.frame topic. The pixels
never ride the RPC envelope. Your SDK resolves a descriptor to pixels for you
by mapping the named ring read-only and reading the slot the descriptor points
at.
A resolved Frame carries the descriptor plus the pixel bytes. Descriptor
fields:
| Field | Meaning |
|---|---|
frame_id | Monotonic frame counter for this camera, starting at 1. |
ts_ms | Capture time in milliseconds. |
width, height | Frame dimensions. |
format | "rgb24", "nv12", or "yuv420p". |
camera_id | Source camera id, for filtering. |
Frame also gives you the pixel bytes: len(pixels) equals the
descriptor’s byte_len, the valid pixel data for width x height in
format.
Default working resolution is 640x480; the engine downscales further per the
active model’s input size. Drop policy is latest-wins: a slow consumer drops
frames, it never blocks the engine. A torn or stale read (the writer recycled
the slot mid-copy) is dropped silently and your callback does not fire for it.
Subscribing to frames
In Python,ctx.vision is a VisionClient. Subscribe with a callback that
runs once per resolved frame:
subscribe_frames is gated on vision.frame.read. The callback may be sync or
async. Keep it cheap: it runs on the delivery path, so offload heavy inference
to a task or queue rather than blocking in the callback.
The Rust SDK is the same surface: ados_sdk::vision::VisionClient, reached via
ctx.vision, with subscribe_frames(camera_id, callback) and the same
Frame shape. Types come from ados_sdk re-exports backed by
ados_protocol::framebus, so a Python plugin and a Rust plugin read the same
wire byte for byte. A Rust plugin’s entry point is
ados_sdk::run_plugin::<MyPlugin, S>(...).
Running inference
There are two honest paths, and a plugin can use either. Both share the same accelerator, which the engine arbitrates. Engine-run. Register a model file plus its metadata with the engine. The engine loads it, runs it on the shared backend, and publishes detections itself. Your plugin only declares the model and then consumes detections like any other consumer.register_model is gated on vision.model.register.
Plugin-side. Run your own model in the plugin and publish the results. You
can also ask the engine to run a registered model against one frame on the
shared backend with ctx.vision.infer(model_id, frame), which returns the
detections without pixels crossing the RPC envelope.
Detections
ADetection has:
bbox— pixel-space box{x, y, width, height}, origin top-left, in the frame’s own resolution.class_label— the class string.confidence— the model’s score.track_id— optional stable id across frames, for tracking models.
DetectionBatch on the vision.detection topic,
labelled by model_id plus the source camera_id, frame_id, and ts_ms so a
consumer can align the boxes to the exact frame they came from.
publish_detection (and the publish_one convenience) are gated on
vision.detection.publish. Built-in consumers, the video overlay and mission
logic, pick the batches up. The Rust SDK exposes the same publish_detection /
publish_one on VisionClient.
Capabilities
Declare the ones you use underagent.permissions in the manifest:
| Capability | Risk | What it grants |
|---|---|---|
vision.frame.read | Medium | Subscribe to normalized engine frames. The plugin never opens the camera itself. |
vision.model.register | Medium | Register an inference model (and run infer against shared frames). |
vision.detection.publish | Low | Publish detections that overlays and mission logic consume. |
A worked extension: vision-nav
The first-partyvision-nav extension is the reference for vision work on the
agent. It provides GPS-denied navigation: optical flow plus monocular visual
inertial odometry, feeding position estimates back to the flight controller
through the visual-odometry pose helper on ctx.vision. It runs against
ArduPilot 4.5+, PX4 1.14+, and iNav 7.0+.
Look at that extension in ADOSExtensions/extensions/vision-nav/ for a real
manifest.yaml, config schema, and agent/GCS split before you build your own
vision plugin.
Multiple cameras
When more than one camera is attached, the engine publishes a separate ring and descriptor stream per camera. Filter to the one you want by passingcamera_id:
Some("uvc-0") for the camera_id argument. A
None filter receives every camera’s frames; tag your detections with the
frame’s camera_id so consumers know which camera each box came from.
Testing
Exercise the frame and detection path without a real host, engine, shared memory, or socket using the real test harness. The fake builds a real frame ring through the same frame-transport contract and resolves each synthetic frame the way the production client does, so your frame-handling path runs end to end; only the host and the OS shared-memory object are faked. In Python, useados.sdk.testing.FakeVisionEngine:
VisionClient (the ring lives in a temp
directory the client maps read-only, and the descriptor rides a real delivery
event into the client’s resolver), build the engine with a file-backed ring and
attach your callback:
engine.captured_detections().
The Rust harness mirrors this: ados_sdk::testing::FakeVisionEngine emits
synthetic frames into the same FrameCallback your plugin registers and
captures the detection batches it publishes.
Bundling weights
You can ship small weights as assets in the archive:model_path on the model metadata. Do not fetch weights
from a remote URL at runtime; the plugin’s network.outbound policy blocks
runtime fetches by default. Bundle the weights or load a file already on the
agent.
Vendor binary models
Closed-source vendor binaries (for example, proprietary thermal AGC processing as a.so) ship under vendor/ per the
vendor binaries doc. The plugin loads them with
ctypes or cffi and calls them from its own inference path, then publishes
the results as a DetectionBatch like any other plugin-side model.