Skip to content

Conversation

@Anri-Lombard
Copy link

Summary

  • Fixed exception propagation from CPU scheduler worker threads to Python
  • Added std::exception_ptr storage in StreamThread to capture exceptions thrown during task execution
  • Exceptions are now re-thrown during synchronize() so they propagate as RuntimeError to Python

Problem

When mx.linalg.solve is called with a singular matrix, the LU factorization throws a std::runtime_error inside the scheduler worker thread. Previously, this exception was not caught, causing the process to crash with SIGABRT instead of raising a catchable Python exception.

Test plan

  • Added regression test in test_linalg.py::test_solve that verifies singular matrix raises RuntimeError
  • All existing linalg tests pass (16 tests, 181 subtests)
  • Manual verification:
import mlx.core as mx
result = mx.linalg.solve(mx.ones((2, 2)), mx.ones((2,)), stream=mx.cpu)
result.tolist()  # Now raises RuntimeError instead of crashing

Fixes #2888

Exceptions thrown in CPU scheduler worker threads (e.g., when LAPACK
detects a singular matrix) were not being caught, causing process
crashes instead of raising Python exceptions.

Added exception capture in StreamThread and re-throwing during
synchronize() so errors propagate as RuntimeError to Python.

Fixes ml-explore#2888
@awni
Copy link
Member

awni commented Jan 5, 2026

The challenge with this is not so much propagating the exception but making sure the MLX is in a sane state when the exception is raised. Otherwise we don't want to give the client to the opportunity to catch the exception because they may assume it's safe to continue.

Also I think catching the exception only on synchronization is sub-optimal. We should check for exceptions more frequently (like maybe after a wait or something like that.

I'm going to close this for now as I don't think it's at the point that we can consider it. But if you want to send a new PR with the above feedback in mind feel free to do so.

@awni awni closed this Jan 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Singular matrix

2 participants