Description
In the computer-vision demo app's vision_camera screen, the semantic-segmentation overlay (FCN ResNet50 in the repro below, but the code path is shared across all variants) lands rotated and displaced relative to the actual subject in the camera preview. The mask shape covers roughly the right area but is offset and warped — e.g. a TV in the center of the frame produces a purple blob shifted up-and-left and stretched.
Where it happens
apps/computer-vision/components/vision_camera/tasks/SegmentationTask.tsx:156-165:
// Sensor frames are landscape-native, so width/height are swapped
// relative to portrait screen orientation.
const screenW = frame.height;
const screenH = frame.width;
const maskW =
argmax.length === screenW * screenH
? screenW
: Math.round(Math.sqrt(argmax.length));
const maskH =
argmax.length === screenW * screenH
? screenH
: Math.round(Math.sqrt(argmax.length));
Two issues:
- Square-mask fallback. When
argmax.length !== screenW * screenH, the code assumes the mask is square (sqrt(argmax.length) used for both width and height). For models that output at a non-square spatial resolution — or when the swap on lines 156–157 disagrees with the actual frame orientation — maskW and maskH get a single sqrt value, the mask is built with the wrong aspect, and Skia's fit="cover" then center-crops/stretches it onto the portrait canvas. Net result: rotated/displaced overlay.
- Hard-coded landscape→portrait swap. Lines 156–157 assume
frame.width/frame.height always report landscape-native sensor dimensions. That isn't stable across vision-camera versions and rotation states. When the swap is wrong, branch 1 misses and we fall into the square fallback.
Reproduction
- Open
computer-vision demo → vision_camera screen.
- Select Segment → FCN ResNet50 (or any DeepLab variant).
- Aim the camera at a clear, asymmetric subject (e.g. a TV).
- The colored overlay appears in the right area but rotated/displaced relative to the actual subject.
Suggested fix
- Get the true mask dimensions from the model output rather than guessing. The hook returns spatial dims for each task (or we can expose them) — use those directly.
- Stop swapping
frame.width/frame.height blindly. Use frame.orientation or a vision-camera helper to determine the actual on-screen aspect, then map model coords → display coords explicitly (a transform on the <SkiaImage>, not just fit="cover").
- Once dims are correct, the
fit mode should match what's used for the camera preview itself, so the mask aligns pixel-for-pixel.
Notes
Description
In the
computer-visiondemo app's vision_camera screen, the semantic-segmentation overlay (FCN ResNet50 in the repro below, but the code path is shared across all variants) lands rotated and displaced relative to the actual subject in the camera preview. The mask shape covers roughly the right area but is offset and warped — e.g. a TV in the center of the frame produces a purple blob shifted up-and-left and stretched.Where it happens
apps/computer-vision/components/vision_camera/tasks/SegmentationTask.tsx:156-165:Two issues:
argmax.length !== screenW * screenH, the code assumes the mask is square (sqrt(argmax.length)used for both width and height). For models that output at a non-square spatial resolution — or when the swap on lines 156–157 disagrees with the actual frame orientation —maskWandmaskHget a single sqrt value, the mask is built with the wrong aspect, and Skia'sfit="cover"then center-crops/stretches it onto the portrait canvas. Net result: rotated/displaced overlay.frame.width/frame.heightalways report landscape-native sensor dimensions. That isn't stable across vision-camera versions and rotation states. When the swap is wrong, branch 1 misses and we fall into the square fallback.Reproduction
computer-visiondemo → vision_camera screen.Suggested fix
frame.width/frame.heightblindly. Useframe.orientationor a vision-camera helper to determine the actual on-screen aspect, then map model coords → display coords explicitly (a transform on the<SkiaImage>, not justfit="cover").fitmode should match what's used for the camera preview itself, so the mask aligns pixel-for-pixel.Notes
main— unrelated to PR feat(constants)!: switch URLs to v0.9.0 layout + add MODEL_REGISTRY #1148 (model registry).