hook pthread_mutex_lock/pthread_mutex_unlock on Linux/non-macOS Unix#444
hook pthread_mutex_lock/pthread_mutex_unlock on Linux/non-macOS Unix#444Copilot wants to merge 11 commits into
Conversation
Agent-Logs-Url: https://github.com/acl-dev/open-coroutine/sessions/a4199b83-4d02-43f6-aa5a-f79a5303135a Co-authored-by: loongs-zhang <38336731+loongs-zhang@users.noreply.github.com>
Agent-Logs-Url: https://github.com/acl-dev/open-coroutine/sessions/a4199b83-4d02-43f6-aa5a-f79a5303135a Co-authored-by: loongs-zhang <38336731+loongs-zhang@users.noreply.github.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #444 +/- ##
==========================================
+ Coverage 74.53% 74.68% +0.14%
==========================================
Files 45 45
Lines 4744 4744
==========================================
+ Hits 3536 3543 +7
+ Misses 1208 1201 -7 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
loongs-zhang
left a comment
There was a problem hiding this comment.
Locate the root cause of CI fails and fix CI.
…ss hooks, cfg-gate test Agent-Logs-Url: https://github.com/acl-dev/open-coroutine/sessions/66bb6c33-ec19-440e-82f5-08d9566b66a5 Co-authored-by: loongs-zhang <38336731+loongs-zhang@users.noreply.github.com>
Agent-Logs-Url: https://github.com/acl-dev/open-coroutine/sessions/66bb6c33-ec19-440e-82f5-08d9566b66a5 Co-authored-by: loongs-zhang <38336731+loongs-zhang@users.noreply.github.com>
Root causes identified and fixed in commits Linux ( macOS (SIGBUS/SIGILL crash):
Windows (stack overflow / test timeout on all variants): The
|
loongs-zhang
left a comment
There was a problem hiding this comment.
Locate the root cause of CI fails and fix CI.
Agent-Logs-Url: https://github.com/acl-dev/open-coroutine/sessions/541adcfb-08ea-4748-b20e-89587e89b562 Co-authored-by: loongs-zhang <38336731+loongs-zhang@users.noreply.github.com>
Two remaining CI failures fixed in commit Linux aarch64/riscv64 preemptive deadlock ( Windows nightly MSVC |
loongs-zhang
left a comment
There was a problem hiding this comment.
Locate the root fail of https://github.com/acl-dev/open-coroutine/actions/runs/25895183498?pr=444 , then make fix.
…O overhead and recursion Agent-Logs-Url: https://github.com/acl-dev/open-coroutine/sessions/78752aa5-9ec8-47c6-ac7c-b34468cc19ce Co-authored-by: loongs-zhang <38336731+loongs-zhang@users.noreply.github.com>
Root causes identified and fixed in commit Windows (all variants) — socket test timeouts:
Fix: macOS (unused import Linux x86_64 / loongarch64 SIGSEGV in |
loongs-zhang
left a comment
There was a problem hiding this comment.
We got other dead lock on macos, check CI, locate the root cause and fix.
…to eliminate deadlocks and timeouts Agent-Logs-Url: https://github.com/acl-dev/open-coroutine/sessions/9322e3e6-da22-49aa-82dd-ac102a8c0f37 Co-authored-by: loongs-zhang <38336731+loongs-zhang@users.noreply.github.com>
Agent-Logs-Url: https://github.com/acl-dev/open-coroutine/sessions/9322e3e6-da22-49aa-82dd-ac102a8c0f37 Co-authored-by: loongs-zhang <38336731+loongs-zhang@users.noreply.github.com>
Root cause and fix in commit macOS deadlock ( Windows timeouts / STATUS_GUARD_PAGE_VIOLATION ( |
Continues and finishes #443.
Changes
hook/src/syscall/unix.rs: Enable hooks forpthread_mutex_lockandpthread_mutex_unlockon Linux and other non-macOS Unix platforms only (#[cfg(not(target_os = "macos"))]). On these platforms,once_cell::sync::Lazyinitialization uses futex (notpthread_mutex_t), so the standardimpl_hook!macro is safe. On macOS the hooks are omitted entirely:dlsyminternally acquires a dyld lock backed bypthread_mutex_t, which would causeimpl_hook!'sLazyinitializer to recurse. The per-thread re-entrancy flag that was previously used to break that cycle causes a separate cross-coroutine deadlock under preemptive scheduling — a coroutine preempted mid-NIO-loop leaves the flag set, so the next coroutine on the same thread uses the real blocking function and can deadlock the event-loop thread. Because the NIO path is just atrylockpoll loop with no genuine async benefit, the macOS hooks are dropped. The coreopen_coroutine_core::syscall::pthread_mutex_{lock,unlock}functions remain available for direct use.core/src/syscall/windows/WaitOnAddress.rs:NioWaitOnAddressSyscall::WaitOnAddressis a simple pass-through that delegates directly to the real function. A NIO polling loop was tried but caused two problems: (1) recursion —EventLoops::wait_eventaccesses DashMap/parking_lot internals which callWaitOnAddress, creating an infinite chain; (2) excessive overhead — on nightly Windows,stdusesWaitOnAddressfor many internal mutex operations, and each call from within a coroutine incurred significant overhead, causingsocket_co_serverand similar tests to exceed their timeout. Passing through directly avoids both issues.hook/src/syscall/windows.rs: TheWaitOnAddresshook is not installed. On nightly Windows,std::sync::MutexusesWaitOnAddressfor every lock operation; intercepting those calls through the facade (state-machine transitions + MonitorListener submit/remove per call) added enough overhead to causesocket_co_serverto time out and producedSTATUS_GUARD_PAGE_VIOLATION(stack overflow) on i686. SinceNioWaitOnAddressSyscallwas already a pass-through, removing the hook has identical semantics but eliminates the overhead.core/Cargo.toml: Addlibcto[target.'cfg(unix)'.dev-dependencies]to allow integration tests to use libc types directly.core/tests/scheduler.rs: Addscheduler_pthread_mutex_locktest (gated on#[cfg(all(unix, feature = "syscall"))]) that verifiespthread_mutex_lockandpthread_mutex_unlockwork correctly within coroutines using a shared static mutex.Make sure that: