-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
🐛 Bug
Segfault after last update. Before the update the gemma3 model works fine
To Reproduce
ROCR_VISIBLE_DEVICES=2 python -m mlc_llm serve HF://mlc-ai/gemma-3-27b-it-q4f16_1-MLC --port 8081 --overrides "tensor_parallel_shards=1;max_total_seq_length=2768;gpu_memory_utilization=0.92;" --mode server --device="rocm:0"
mlc_llm/home/rig/.local/lib/python3.12/site-packages/tvm_ffi/_optional_torch_c_dlpack.py:559: UserWarning: Failed to load torch c dlpack extension: Error building extension 'c_dlpack': [1/2] c++ -MMD -MF main.o.d -DTORCH_EXTENSION_NAME=c_dlpack -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1016" -I/home/rig/.local/lib/python3.12/site-packages/tvm_ffi/include -I/home/rig/.local/lib/python3.12/site-packages/tvm_ffi/include -I/home/rig/.local/lib/python3.12/site-packages/torch/include -I/home/rig/.local/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/home/rig/.local/lib/python3.12/site-packages/torch/include/THH -I/opt/rocm-6.3.4/include -isystem /home/rig/.local/lib/python3.12/site-packages/torch/include -isystem /home/rig/.local/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/include/python3.12 -fPIC -std=c++17 -O3 -DBUILD_WITH_CUDA -c /home/rig/.cache/torch_extensions/py312_cpu/c_dlpack/main.cpp -o main.o -D__HIP_PLATFORM_AMD=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -fPIC
FAILED: main.o
c++ -MMD -MF main.o.d -DTORCH_EXTENSION_NAME=c_dlpack -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1016" -I/home/rig/.local/lib/python3.12/site-packages/tvm_ffi/include -I/home/rig/.local/lib/python3.12/site-packages/tvm_ffi/include -I/home/rig/.local/lib/python3.12/site-packages/torch/include -I/home/rig/.local/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/home/rig/.local/lib/python3.12/site-packages/torch/include/THH -I/opt/rocm-6.3.4/include -isystem /home/rig/.local/lib/python3.12/site-packages/torch/include -isystem /home/rig/.local/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/include/python3.12 -fPIC -std=c++17 -O3 -DBUILD_WITH_CUDA -c /home/rig/.cache/torch_extensions/py312_cpu/c_dlpack/main.cpp -o main.o -D__HIP_PLATFORM_AMD=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -fPIC
In file included from /home/rig/.cache/torch_extensions/py312_cpu/c_dlpack/main.cpp:8:
/home/rig/.local/lib/python3.12/site-packages/torch/include/c10/cuda/CUDAStream.h:3:10: fatal error: cuda_runtime_api.h: No such file or directory
3 | #include <cuda_runtime_api.h>
| ^~~~~~~~~~~~~~~~~~~~
compilation terminated.
ninja: build stopped: subcommand failed.
,EnvTensorAllocator will not be enabled.
warnings.warn(
[2025-10-31 09:33:55] INFO auto_device.py:82: Found device: rocm:0
[2025-10-31 09:33:55] INFO download_cache.py:227: Downloading model from HuggingFace: HF://mlc-ai/gemma-3-27b-it-q4f16_1-MLC
[2025-10-31 09:33:55] INFO download_cache.py:29: MLC_DOWNLOAD_CACHE_POLICY = ON. Can be one of: ON, OFF, REDO, READONLY
[2025-10-31 09:33:55] INFO download_cache.py:166: Weights already downloaded: /home/rig/.cache/mlc_llm/model_weights/hf/mlc-ai/gemma-3-27b-it-q4f16_1-MLC
[2025-10-31 09:33:56] INFO jit.py:43: MLC_JIT_POLICY = ON. Can be one of: ON, OFF, REDO, READONLY
[2025-10-31 09:33:56] INFO jit.py:158: Using cached model lib: /home/rig/.cache/mlc_llm/model_lib/b7e96d134f84cd2d4cf435be3748adc1.so
[2025-10-31 09:33:56] INFO engine_base.py:192: The selected engine mode is server. We use as much GPU memory as possible (within the limit of gpu_memory_utilization).
[2025-10-31 09:33:56] INFO engine_base.py:200: If you have low concurrent requests and want to use less GPU memory, please select mode "local".
[2025-10-31 09:33:56] INFO engine_base.py:205: If you don't have concurrent requests and only use the engine interactively, please select mode "interactive".
!!!!!!! Segfault encountered !!!!!!!
File "./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c", line 0, in 0x000079290d84532f
File "./malloc/malloc.c", line 3375, in __GI___libc_free
File "", line 0, in std::filesystem::__cxx11::path::~path()
File "", line 0, in mlc::llm::Tokenizer::DetectTokenizerInfo(tvm::ffi::String const&)
File "", line 0, in mlc::llm::Tokenizer::FromPath(tvm::ffi::String const&, std::optionalmlc::llm::TokenizerInfo)
File "", line 0, in PyObject_Vectorcall
File "", line 0, in _PyEval_EvalFrameDefault
File "", line 0, in _PyObject_Call_Prepend
File "", line 0, in _PyObject_MakeTpCall
File "", line 0, in _PyEval_EvalFrameDefault
File "", line 0, in _PyObject_Call_Prepend
File "", line 0, in _PyObject_MakeTpCall
File "", line 0, in _PyEval_EvalFrameDefault
File "", line 0, in PyEval_EvalCode
File "", line 0, in PyObject_Vectorcall
File "", line 0, in _PyEval_EvalFrameDefault
File "", line 0, in Py_RunMain
File "", line 0, in Py_BytesMain
File "", line 0, in _start
File "", line 0, in 0xffffffffffffffff
Segmentation fault (core dumped)
Expected behavior
Start and infere
Environment
ROCM 6.3.4
AMD Instinct Mi50 16 Gb
Additional context
I recommend not to delete the previous version whl from repo to return back