fix: Remove initial-exec TLS storage to enable dlopen#522
Open
fix: Remove initial-exec TLS storage to enable dlopen#522
Conversation
This was referenced Mar 23, 2026
Collaborator
Author
|
WIP: Still some alpine / glibc differences to consider |
Collaborator
Author
|
There is an issue with the initial-exec + musl dlopen that needs a bigger change |
Benchmark results for collatzParameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 1 metrics, 0 unstable metrics. See unchanged results
|
Benchmark results for BadBoggleSolver_runParameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 1 metrics, 0 unstable metrics. See unchanged results
|
…er state The embedded .so referenced ddprof_lib_state (defined in the loader) using the initial-exec TLS model. On musl, initial-exec TLS cross-library references are rejected when the defining library is loaded via dlopen, because musl freezes static TLS slot allocation at program startup (ldso/dynlink.c:456). Switch TrackerThreadLocalState storage to pthread_key_t + pthread_getspecific: - Eliminates the cross-library initial-exec TLS reference entirely - get_tl_state() returns nullptr when key is uninitialized (no initialized flag) - init_tl_state() heap-allocates TrackerThreadLocalState via new/pthread_setspecific - ensure_key_initialized() uses CAS for thread-safe one-time key creation - Remove ddprof_lib_state from loader.c (no longer referenced by embedded .so) - Remove tls_state_storage.h include (constants no longer needed) - loader_rtld_global test now passes on both glibc and musl/Alpine Co-Authored-By: Xavier Roche <xroche@users.noreply.github.com> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
27d1c7a to
e16987f
Compare
- Update loader_rtld_global_test.c comment to accurately describe both failure modes (glibc RTLD_GLOBAL timing and musl initial-exec TLS rejection) and the pthread_key_t fix - Add comment on kInvalidKey sentinel explaining the POSIX assumption - Add comment in free() explaining why pthread_key_delete is skipped (race with concurrent get_tl_state() callers) - Document why ensure_key_initialized() is public (fork test needs it without a full ring-buffer init) - Add BM_GetTlState microbenchmark to measure raw TLS access overhead (pthread_getspecific vs initial-exec __thread) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
r1viollet
commented
Mar 24, 2026
…initialized() release relaxed load could transiently return kInvalidKey on a thread that races with ensure_key_initialized(), causing init_tl_state() to return nullptr and silently drop all initial allocations for that thread. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
This is a revert of the following PR in favour of changes proposed here.
Fixes the failure reported in #520 / #521: libdd_profiling.so fails to load its embedded .so when opened via dlopen.
Musl hard-rejects initial-exec TLS cross-library references when the defining library is dynamically loaded (ldso/dynlink.c:456: tls_id > static_tls_cnt), regardless of symbol visibility.
Motivation
Allowing dlopen use case.
How
Replace direct initial-exec TLS storage of TrackerThreadLocalState with pthread_key_t + pthread_getspecific/pthread_setspecific:
This eliminates the cross-library TLS reference entirely, so the embedded .so no longer requires any symbol from the loader at load time. The RTLD_NOLOAD self-promotion mechanism (which also used dladdr, unavailable in the -nolibc universal build) is removed as dead code.
How to test the change?
A new test loader_rtld_global loads libdd_profiling.so via dlopen(..., RTLD_GLOBAL) and calls ddprof_start_profiling(). Without the fix it returns -1 (embedded library not loaded). With the fix it passes on both glibc (Ubuntu 24.04) and musl (Alpine 3.22).
Performance
Claude measured before and after accessing TLS state
Benchmark results: pthread_getspecific (branch) vs __thread initial-exec (main)
So the TLS access is slower. Considering this is still O(nanoseconds) in the hot path, I hope this does not impact performance significantly.