SAM3 → TensorRT

Export Meta AI's Segment Anything 3 (SAM3) model to ONNX, then build a TensorRT engine for real-time segmentation. This repo includes a CUDA inference library and demo apps for semantic and instance segmentation.

Project Overview

Python tooling to export SAM3 to a clean ONNX graph.
TensorRT-ready workflows for building optimized engines.
A C++/CUDA library for high-performance inference with demo apps.
Support for Promptable concept segmentation (PCS), the latest feautre in SAM3.
Zero-copy support on unified-memory platforms (Jetson, DGX Spark). Great for robotics/real-time interaction.
Everything runs inside a reproducible docker environment (x86, Jetson, Spark).
MIT license for the love of everything nice :)

Demos

Video demo (click to play):

Semantic segmentation produced by the C++ demo app (prompt='dog')

Instance segmentation results (prompt='box')

Repo Layout

python/ - ONNX export and visualization scripts.
cpp/ - C++/CUDA library and apps (TensorRT inference).
docker/ - Container setup (Dockerfile.x86, with an aarch64 variant expected).
demo/ - Example outputs from the C++ demo app.

Quickstart

Request access to the gated model
- Visit https://huggingface.co/facebook/sam3 and request access.
- Ensure your HF_TOKEN has permission.
- Set HF_TOKEN as environment variable in the host. Docker will pick it up from there.
Build the Docker container for your platform (all commands below run inside it)

On x86

docker build -t sam3-trt -f docker/Dockerfile.x86 .

On Jetson/Spark

For aarch64 platforms with shared CPU/GPU memory, the C++ library in this repo supports zero-copy inference paths.

Build and run the aarch64 container:

docker build -t sam3-trt-aarch64 -f docker/Dockerfile.aarch64 .

Export HF_TOKEN and run the docker container

export HF_TOKEN=<YOUR TOKEN>
docker run -it --rm \
  --network=host \
  --gpus all \
  --ipc=host \
  --ulimit memlock=-1 \
  --ulimit stack=67108864 \
  --runtime=nvidia \
  --env HF_TOKEN \
  -v "$PWD":/workspace \
  -w /workspace \
  sam3-trt bash

Export to ONNX

python python/onnxexport.py

This produces onnx_weights/sam3_static.onnx plus external weight shards.

Build a TensorRT engine

trtexec --onnx=onnx_weights/sam3_static.onnx --saveEngine=sam3_fp16.plan --fp16 --verbose

Build the C++/CUDA library and sample app

mkdir cpp/build && cd cpp/build
cmake ..
make

Run the demo app

./sam3_pcs_app <image_dir> <engine_path.engine>

Results are written to a results/ folder.

Extensions

This is a very raw project and provides the crucial backend TensorRT/CUDA bits necessary for anything. From here, please feel free to fan out into any application you like. Pull requests are very welcome! Here are some ideas I can think of:

ROS2 wrapper for real-time robotics pipelines.
Interactive voice-based segmentation app. Have someone speak into a microphone, use a TTS model to transcribe it and feed into the engine, which then produces the segmentation mask live. I don't have the time to build it but I hope you can.
Live camera input and overlays. You will need a beefy GPU. SAM3 doesn't run realtime on a Jetson nano.

Troubleshooting

Access errors: Make sure your HF_TOKEN has access to facebook/sam3.
ONNX export fails: Install transformers from source if SAM3 is missing.
TensorRT parse errors: Ensure the full onnx_weights/ directory is copied (external data is required).
C++ build errors: Confirm CUDA, TensorRT, and OpenCV are installed and discoverable via pkg-config.

Development guide

CUDA Library Notes

The shared library target is sam3_trt.
Demo app: sam3_pcs_app (semantic/instance visualization modes).
Outputs include semantic segmentation and instance segmentation mask logits. If you choose SAM3_VISUALIZATION::VIS_NONE in your application, you need to apply sigmoid yourself.
The library does not support building engines. Use trtexec instead.

ONNX Export Details

Default export runs on CPU for compatibility (switch device to cuda if desired).
SAM3 is large and exports with external weight shards; keep the entire onnx_weights/ directory together.

TensorRT Notes

Use trtexec for quick engine builds and benchmarking.
FP16 is the usual starting point; INT8/FP8/INT4 require calibration or compatible tooling.

License

MIT (see LICENSE).

If this saved you time, drop a ⭐ so others can find it and ship SAM-3 faster.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SAM3 → TensorRT

Table of Contents

Project Overview

Demos

Repo Layout

Quickstart

On x86

On Jetson/Spark

Extensions

Troubleshooting

Development guide

CUDA Library Notes

ONNX Export Details

TensorRT Notes

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
cpp		cpp
demo		demo
docker		docker
python		python
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

License

dataplayer12/SAM3-TensorRT

Folders and files

Latest commit

History

Repository files navigation

SAM3 → TensorRT

Table of Contents

Project Overview

Demos

Repo Layout

Quickstart

On x86

On Jetson/Spark

Extensions

Troubleshooting

Development guide

CUDA Library Notes

ONNX Export Details

TensorRT Notes

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages