Skip to content

vision_camera demo: semantic segmentation overlay misaligned in portrait #1158

@msluszniak

Description

@msluszniak

Description

In the computer-vision demo app's vision_camera screen, the semantic-segmentation overlay (FCN ResNet50 in the repro below, but the code path is shared across all variants) lands rotated and displaced relative to the actual subject in the camera preview. The mask shape covers roughly the right area but is offset and warped — e.g. a TV in the center of the frame produces a purple blob shifted up-and-left and stretched.

Where it happens

apps/computer-vision/components/vision_camera/tasks/SegmentationTask.tsx:156-165:

// Sensor frames are landscape-native, so width/height are swapped
// relative to portrait screen orientation.
const screenW = frame.height;
const screenH = frame.width;
const maskW =
  argmax.length === screenW * screenH
    ? screenW
    : Math.round(Math.sqrt(argmax.length));
const maskH =
  argmax.length === screenW * screenH
    ? screenH
    : Math.round(Math.sqrt(argmax.length));

Two issues:

  1. Square-mask fallback. When argmax.length !== screenW * screenH, the code assumes the mask is square (sqrt(argmax.length) used for both width and height). For models that output at a non-square spatial resolution — or when the swap on lines 156–157 disagrees with the actual frame orientation — maskW and maskH get a single sqrt value, the mask is built with the wrong aspect, and Skia's fit="cover" then center-crops/stretches it onto the portrait canvas. Net result: rotated/displaced overlay.
  2. Hard-coded landscape→portrait swap. Lines 156–157 assume frame.width/frame.height always report landscape-native sensor dimensions. That isn't stable across vision-camera versions and rotation states. When the swap is wrong, branch 1 misses and we fall into the square fallback.

Reproduction

  1. Open computer-vision demo → vision_camera screen.
  2. Select Segment → FCN ResNet50 (or any DeepLab variant).
  3. Aim the camera at a clear, asymmetric subject (e.g. a TV).
  4. The colored overlay appears in the right area but rotated/displaced relative to the actual subject.

Suggested fix

  • Get the true mask dimensions from the model output rather than guessing. The hook returns spatial dims for each task (or we can expose them) — use those directly.
  • Stop swapping frame.width/frame.height blindly. Use frame.orientation or a vision-camera helper to determine the actual on-screen aspect, then map model coords → display coords explicitly (a transform on the <SkiaImage>, not just fit="cover").
  • Once dims are correct, the fit mode should match what's used for the camera preview itself, so the mask aligns pixel-for-pixel.

Notes

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions