refactor: simplify TLS key init with pthread_once and fix fork leak#523
Conversation
563f2e1 to
2aad051
Compare
| AllocationTracker *AllocationTracker::_instance; | ||
| std::atomic<pthread_key_t> AllocationTracker::_tl_state_key{ | ||
| AllocationTracker::kInvalidKey}; | ||
| pthread_once_t AllocationTracker::_tl_key_once = PTHREAD_ONCE_INIT; |
There was a problem hiding this comment.
A user pointed to a crash that could involve pthread once.
I was not able to reproduce it locally.
I also do not fully trust the differences between musl and glibc for pthread once.
There was a problem hiding this comment.
As far as I can see (with Claude assist), neither glibc nor musl's pthread_once can crash from correct usage.
Both implementations:
- Use a CAS on an int (the pthread_once_t control variable)
- The only memory accesses are to the control variable itself (atomically) and a futex wake/wait
- Both always return 0 (musl never returns an error, glibc always returns 0)
- Both handle thread cancellation during init_routine via cleanup handlers
Could it be a sequence of double initialization maybe ?
There was a problem hiding this comment.
Note: The only scenario where musl could deadlock is if we fork-during-init: If fork() happens while pthread_once is executing create_key(), the child process deadlocks on musl. We could mitigate it by calling ensure_key_initialized() from the library constructor if this is a concern.
There was a problem hiding this comment.
Humm, scenario added:
Fork safety
ddprof is statically linked with musl and loaded into glibc processes. musl's pthread_once has no fork-generation mechanism: if fork() happens while pthread_once is in-progress, the child deadlocks because musl's futex wait will never be woken (the initializing thread doesn't exist in the child, and musl's fork handlers aren't registered when musl isn't the host libc).
To eliminate this race, ensure_key_initialized() is called from the library constructor (ProfilerAutoStart), before any thread can call fork(). This guarantees pthread_once has completed by the time threads exist.
There was a problem hiding this comment.
Thanks for thinking carefully about this. At the moment, I still think keeping the atomic is simpler.
Just to fix slightly what is above: we try not to rely on libc implementation for the library part (what is injected into the process). We build against musl, but deploy and inject into binaries that depend on glibc. So the library does not pull libc as an explicit dependency.
We looked at the ABI for pthread_once and in theory it should be OK, but I still think relying on the atomic is better.
There was a problem hiding this comment.
ddprof is statically linked with musl and loaded into glibc processes. musl's pthread_once has no fork-generation mechanism: if fork() happens while pthread_once is in-progress, the child deadlocks because musl's futex wait will never be woken (the initializing thread doesn't exist in the child, and musl's fork handlers aren't registered when musl isn't the host libc).
libddprof is not linked statically with musl (I don't think that's possible for a shared library to statically link libc): libddprof does not depend directly on a libc but makes the assumption that one will be loaded before libddprof is itself loaded. Hence when libddprof is loaded on a glibc host it will use glibc functions, and no musl code will ever be executed.
There was a problem hiding this comment.
Why not call pthread_key_create in allocation_tracking_init ?
At this point symbol interposition is not yet active, so there should be no concurrent calls to init_tl_state / get_tl_state. No need for pthread_once / atomic.
There was a problem hiding this comment.
Note that the tl_state of existing threads (other than the one executing allocation_tracking_init) will never get initialized, since init_tl_state is called upon thread creation (we had many issues with lazily creating the thread local state).
2aad051 to
a149fb4
Compare
Replace the hand-rolled CAS loop and kInvalidKey sentinel with pthread_once. This removes the std::atomic<pthread_key_t>, the non-portable assumption that key value -1 is unused, the CAS retry + rollback logic, and all memory ordering concerns. get_tl_state() becomes a single pthread_getspecific call with no sentinel check. Before ensure_key_initialized() runs, _tl_state_key is zero-initialized; pthread_getspecific on an uncreated key returns NULL on both glibc and musl. init_tl_state() now deletes any previously set state before allocating, fixing a memory leak when re-initializing after fork. All pthread syscalls are checked via a PTHREAD_CHECK macro that aborts with the stringified call expression and strerror() diagnostic. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
a149fb4 to
b4341b5
Compare
| if (key == kInvalidKey) { | ||
| return nullptr; | ||
| } | ||
| // Free any previously set state (e.g. inherited across fork). |
There was a problem hiding this comment.
I think there is a fork leak, but we can not fully address it here...
TLS destructors are not called on fork when you have several threads.
I could still add this to my PR, it does not do any harm (though I doubt we should be hitting this path):
- for the main thread that forked, we should keep existing state & not call the init again.
- for other threads, we would lose the state (and leak it)
There was a problem hiding this comment.
I think this could be a real issue with the pthread approach, thanks for raising
Summary
Simplify the pthread_key_t initialization added in #522.
Changes
kInvalidKeysentinel withpthread_once, removingstd::atomic<pthread_key_t>, the non-portable sentinel assumption, and all memory ordering.PTHREAD_CHECKmacro that aborts with a diagnostic (strerror()on the returned error code) forpthread_key_create,pthread_once, andpthread_setspecific.init_tl_state()when re-initializing afterfork()(the old code allocated without freeing the inherited pointer).SetItemsProcessedinBM_GetTlState: it was called inside the benchmark loop instead of after it.Fork safety
ddprof is statically linked with musl and loaded into glibc processes. musl's
pthread_oncehas no fork-generation mechanism: iffork()happens whilepthread_onceis in-progress, the child deadlocks because musl's futex wait will never be woken (the initializing thread doesn't exist in the child, and musl's fork handlers aren't registered when musl isn't the host libc).To eliminate this race,
ensure_key_initialized()is called from the library constructor (ProfilerAutoStart), before any thread can callfork(). This guaranteespthread_oncehas completed by the time threads exist.