Vision plugins - ADOS Documentation

The agent ships a vision engine: a host service that owns the camera(s), captures and normalizes frames, and publishes them on a shared frame bus that plugins read. A vision plugin does not open the camera itself. It subscribes to the normalized frame stream, runs (or asks the engine to run) inference, and publishes detections that overlays and mission logic pick up. This is a host-owned bus, not a per-plugin capture loop. The engine arbitrates the camera and the accelerator across every vision consumer, so two plugins can watch the same camera without fighting over the device or the NPU. The operator manages vision models and watches detections in the per-drone Vision tab under the Command area of Mission Control.

The frame contract

The engine downscales and converts each camera’s native format to a normalized frame, writes the pixels into a shared-memory ring (one ring per camera), and publishes a small frame descriptor on the vision.frame topic. The pixels never ride the RPC envelope. Your SDK resolves a descriptor to pixels for you by mapping the named ring read-only and reading the slot the descriptor points at. A resolved Frame carries the descriptor plus the pixel bytes. Descriptor fields:

Field	Meaning
`frame_id`	Monotonic frame counter for this camera, starting at 1.
`ts_ms`	Capture time in milliseconds.
`width`, `height`	Frame dimensions.
`format`	`"rgb24"`, `"nv12"`, or `"yuv420p"`.
`camera_id`	Source camera id, for filtering.

The Frame also gives you the pixel bytes: len(pixels) equals the descriptor’s byte_len, the valid pixel data for width x height in format. Default working resolution is 640x480; the engine downscales further per the active model’s input size. Drop policy is latest-wins: a slow consumer drops frames, it never blocks the engine. A torn or stale read (the writer recycled the slot mid-copy) is dropped silently and your callback does not fire for it.

Subscribing to frames

In Python, ctx.vision is a VisionClient. Subscribe with a callback that runs once per resolved frame:

class MyPlugin:
    async def on_start(self, ctx) -> None:
        async def on_frame(frame):
            # frame.descriptor.width, .height, .format, .camera_id, .ts_ms
            # frame.pixels is the resolved bytes for this frame
            detections = self.run(frame)
            for d in detections:
                ...
        await ctx.vision.subscribe_frames(on_frame)

subscribe_frames is gated on vision.frame.read. The callback may be sync or async. Keep it cheap: it runs on the delivery path, so offload heavy inference to a task or queue rather than blocking in the callback. The Rust SDK is the same surface: ados_sdk::vision::VisionClient, reached via ctx.vision, with subscribe_frames(camera_id, callback) and the same Frame shape. Types come from ados_sdk re-exports backed by ados_protocol::framebus, so a Python plugin and a Rust plugin read the same wire byte for byte. A Rust plugin’s entry point is ados_sdk::run_plugin::<MyPlugin, S>(...).

Running inference

There are two honest paths, and a plugin can use either. Both share the same accelerator, which the engine arbitrates. Engine-run. Register a model file plus its metadata with the engine. The engine loads it, runs it on the shared backend, and publishes detections itself. Your plugin only declares the model and then consumes detections like any other consumer.

from ados.sdk.vision import (
    ModelMetadata, ModelKind, ModelExecution, FrameFormat,
)

await ctx.vision.register_model(ModelMetadata(
    id="com.example.weeds",
    kind=ModelKind.DETECTION,
    execution=ModelExecution.ENGINE_RUN,
    input_width=640,
    input_height=480,
    input_format=FrameFormat.RGB24,
    output_classes=["weed", "crop"],
    model_path="/path/on/agent/weeds.onnx",
))

register_model is gated on vision.model.register. Plugin-side. Run your own model in the plugin and publish the results. You can also ask the engine to run a registered model against one frame on the shared backend with ctx.vision.infer(model_id, frame), which returns the detections without pixels crossing the RPC envelope.

async def on_frame(frame):
    boxes = self.session.run(frame)        # your own runtime
    await ctx.vision.publish_one(
        "com.example.weeds", frame,
        Detection(
            bbox=BoundingBox(x=100, y=80, width=40, height=40),
            class_label="weed",
            confidence=0.91,
        ),
    )

The accelerator is shared. A heavy or runaway model starves other vision consumers, so register honest input sizes and keep your inference cadence below your CPU and NPU budget. For NPU-based inference on supported boards, load the board’s accelerator runtime (TFLite delegates, RKNN, and so on); on most vision plugins that is the difference between 50% CPU and 5% CPU.

Detections

A Detection has:

bbox: pixel-space box {x, y, width, height}, origin top-left, in the frame’s own resolution.
class_label: the class string.
confidence: the model’s score.
track_id: optional stable id across frames, for tracking models.

Detections are published as a DetectionBatch on the vision.detection topic, labelled by model_id plus the source camera_id, frame_id, and ts_ms so a consumer can align the boxes to the exact frame they came from.

await ctx.vision.publish_detection(DetectionBatch(
    model_id="com.example.weeds",
    camera_id=frame.descriptor.camera_id,
    frame_id=frame.descriptor.frame_id,
    ts_ms=frame.descriptor.ts_ms,
    detections=[...],
))

publish_detection (and the publish_one convenience) are gated on vision.detection.publish. Built-in consumers, the video overlay and mission logic, pick the batches up. The Rust SDK exposes the same publish_detection / publish_one on VisionClient.

Capabilities

Declare the ones you use under agent.permissions in the manifest:

Capability	Risk	What it grants
`vision.frame.read`	Medium	Subscribe to normalized engine frames. The plugin never opens the camera itself.
`vision.model.register`	Medium	Register an inference model (and run `infer` against shared frames).
`vision.detection.publish`	Low	Publish detections that overlays and mission logic consume.

A worked extension: vision-nav

The first-party vision-nav extension is the reference for vision work on the agent. It provides GPS-denied navigation: optical flow plus monocular visual inertial odometry, feeding position estimates back to the flight controller through the visual-odometry pose helper on ctx.vision. It runs against ArduPilot 4.5+, PX4 1.14+, and iNav 7.0+. Look at that extension in ADOSExtensions/extensions/vision-nav/ for a real manifest.yaml, config schema, and agent/GCS split before you build your own vision plugin.

Multiple cameras

When more than one camera is attached, the engine publishes a separate ring and descriptor stream per camera. Filter to the one you want by passing camera_id:

await ctx.vision.subscribe_frames(on_frame, camera_id="uvc-0")

The Rust equivalent passes Some("uvc-0") for the camera_id argument. A None filter receives every camera’s frames; tag your detections with the frame’s camera_id so consumers know which camera each box came from.

Testing

Exercise the frame and detection path without a real host, engine, shared memory, or socket using the real test harness. The fake builds a real frame ring through the same frame-transport contract and resolves each synthetic frame the way the production client does, so your frame-handling path runs end to end; only the host and the OS shared-memory object are faked. In Python, use ados.sdk.testing.FakeVisionEngine:

from ados.sdk.testing import FakeVisionEngine
from ados.sdk.vision import FrameFormat

async def test_my_plugin():
    engine = FakeVisionEngine("uvc-0", 64, 48, FrameFormat.RGB24)
    seen = []
    engine.on_frame(lambda frame: seen.append(frame))
    engine.push_solid(0x80)        # one flat-grey frame
    await engine.deliver_all()
    assert len(seen) == 1

For an end-to-end run through the real VisionClient (the ring lives in a temp directory the client maps read-only, and the descriptor rides a real delivery event into the client’s resolver), build the engine with a file-backed ring and attach your callback:

engine = FakeVisionEngine.with_shm_dir("uvc-0", 64, 48, FrameFormat.RGB24)
await engine.attach(on_frame)
await engine.deliver_all()

Detections your plugin publishes are captured via engine.captured_detections(). The Rust harness mirrors this: ados_sdk::testing::FakeVisionEngine emits synthetic frames into the same FrameCallback your plugin registers and captures the detection batches it publishes.

Bundling weights

You can ship small weights inside the archive by placing the file under the plugin tree (for example agent/models/weeds.onnx). The manifest has no top-level assets block and no per-file hash; the archive’s single SIGNATURE file covers every entry’s hash, so a bundled weights file is protected by the same signature as the rest of the archive. The plugin loads the file from its data dir at start and, for engine-run models, passes the path as the model_path on the model metadata. Do not fetch weights from a remote URL at runtime; the plugin’s network.outbound policy blocks runtime fetches by default. Bundle the weights or load a file already on the agent.

Vendor binary models

Closed-source vendor binaries (for example, proprietary thermal AGC processing as a .so) ship under vendor/ per the vendor binaries doc. The plugin loads them with ctypes or cffi and calls them from its own inference path, then publishes the results as a DetectionBatch like any other plugin-side model.

​The frame contract

​Subscribing to frames

​Running inference

​Detections

​Capabilities

​A worked extension: vision-nav

​Multiple cameras

​Testing

​Bundling weights

​Vendor binary models

​See also