Skip to content

[Bug] Segfault after last update. Before the update the gemma3 model works fine #3370

@delphiRo

Description

@delphiRo

🐛 Bug

Segfault after last update. Before the update the gemma3 model works fine

To Reproduce

ROCR_VISIBLE_DEVICES=2 python -m mlc_llm serve HF://mlc-ai/gemma-3-27b-it-q4f16_1-MLC --port 8081 --overrides "tensor_parallel_shards=1;max_total_seq_length=2768;gpu_memory_utilization=0.92;" --mode server --device="rocm:0"

mlc_llm/home/rig/.local/lib/python3.12/site-packages/tvm_ffi/_optional_torch_c_dlpack.py:559: UserWarning: Failed to load torch c dlpack extension: Error building extension 'c_dlpack': [1/2] c++ -MMD -MF main.o.d -DTORCH_EXTENSION_NAME=c_dlpack -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1016" -I/home/rig/.local/lib/python3.12/site-packages/tvm_ffi/include -I/home/rig/.local/lib/python3.12/site-packages/tvm_ffi/include -I/home/rig/.local/lib/python3.12/site-packages/torch/include -I/home/rig/.local/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/home/rig/.local/lib/python3.12/site-packages/torch/include/THH -I/opt/rocm-6.3.4/include -isystem /home/rig/.local/lib/python3.12/site-packages/torch/include -isystem /home/rig/.local/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/include/python3.12 -fPIC -std=c++17 -O3 -DBUILD_WITH_CUDA -c /home/rig/.cache/torch_extensions/py312_cpu/c_dlpack/main.cpp -o main.o -D__HIP_PLATFORM_AMD=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -fPIC
FAILED: main.o
c++ -MMD -MF main.o.d -DTORCH_EXTENSION_NAME=c_dlpack -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1016" -I/home/rig/.local/lib/python3.12/site-packages/tvm_ffi/include -I/home/rig/.local/lib/python3.12/site-packages/tvm_ffi/include -I/home/rig/.local/lib/python3.12/site-packages/torch/include -I/home/rig/.local/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/home/rig/.local/lib/python3.12/site-packages/torch/include/THH -I/opt/rocm-6.3.4/include -isystem /home/rig/.local/lib/python3.12/site-packages/torch/include -isystem /home/rig/.local/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/include/python3.12 -fPIC -std=c++17 -O3 -DBUILD_WITH_CUDA -c /home/rig/.cache/torch_extensions/py312_cpu/c_dlpack/main.cpp -o main.o -D__HIP_PLATFORM_AMD=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -fPIC
In file included from /home/rig/.cache/torch_extensions/py312_cpu/c_dlpack/main.cpp:8:
/home/rig/.local/lib/python3.12/site-packages/torch/include/c10/cuda/CUDAStream.h:3:10: fatal error: cuda_runtime_api.h: No such file or directory
3 | #include <cuda_runtime_api.h>
| ^~~~~~~~~~~~~~~~~~~~
compilation terminated.
ninja: build stopped: subcommand failed.
,EnvTensorAllocator will not be enabled.
warnings.warn(
[2025-10-31 09:33:55] INFO auto_device.py:82: Found device: rocm:0
[2025-10-31 09:33:55] INFO download_cache.py:227: Downloading model from HuggingFace: HF://mlc-ai/gemma-3-27b-it-q4f16_1-MLC
[2025-10-31 09:33:55] INFO download_cache.py:29: MLC_DOWNLOAD_CACHE_POLICY = ON. Can be one of: ON, OFF, REDO, READONLY
[2025-10-31 09:33:55] INFO download_cache.py:166: Weights already downloaded: /home/rig/.cache/mlc_llm/model_weights/hf/mlc-ai/gemma-3-27b-it-q4f16_1-MLC
[2025-10-31 09:33:56] INFO jit.py:43: MLC_JIT_POLICY = ON. Can be one of: ON, OFF, REDO, READONLY
[2025-10-31 09:33:56] INFO jit.py:158: Using cached model lib: /home/rig/.cache/mlc_llm/model_lib/b7e96d134f84cd2d4cf435be3748adc1.so
[2025-10-31 09:33:56] INFO engine_base.py:192: The selected engine mode is server. We use as much GPU memory as possible (within the limit of gpu_memory_utilization).
[2025-10-31 09:33:56] INFO engine_base.py:200: If you have low concurrent requests and want to use less GPU memory, please select mode "local".
[2025-10-31 09:33:56] INFO engine_base.py:205: If you don't have concurrent requests and only use the engine interactively, please select mode "interactive".
!!!!!!! Segfault encountered !!!!!!!
File "./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c", line 0, in 0x000079290d84532f
File "./malloc/malloc.c", line 3375, in __GI___libc_free
File "", line 0, in std::filesystem::__cxx11::path::~path()
File "", line 0, in mlc::llm::Tokenizer::DetectTokenizerInfo(tvm::ffi::String const&)
File "", line 0, in mlc::llm::Tokenizer::FromPath(tvm::ffi::String const&, std::optionalmlc::llm::TokenizerInfo)
File "", line 0, in PyObject_Vectorcall
File "", line 0, in _PyEval_EvalFrameDefault
File "", line 0, in _PyObject_Call_Prepend
File "", line 0, in _PyObject_MakeTpCall
File "", line 0, in _PyEval_EvalFrameDefault
File "", line 0, in _PyObject_Call_Prepend
File "", line 0, in _PyObject_MakeTpCall
File "", line 0, in _PyEval_EvalFrameDefault
File "", line 0, in PyEval_EvalCode
File "", line 0, in PyObject_Vectorcall
File "", line 0, in _PyEval_EvalFrameDefault
File "", line 0, in Py_RunMain
File "", line 0, in Py_BytesMain
File "", line 0, in _start
File "", line 0, in 0xffffffffffffffff

Segmentation fault (core dumped)

Expected behavior

Start and infere

Environment

ROCM 6.3.4
AMD Instinct Mi50 16 Gb

Additional context

I recommend not to delete the previous version whl from repo to return back

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugConfirmed bugs

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions