Skip to content

Fix of pmproxy crashes#2525

Open
kurik wants to merge 2 commits intoperformancecopilot:mainfrom
kurik:pmproxy-crash
Open

Fix of pmproxy crashes#2525
kurik wants to merge 2 commits intoperformancecopilot:mainfrom
kurik:pmproxy-crash

Conversation

@kurik
Copy link
Contributor

@kurik kurik commented Mar 10, 2026

Fix concurrent call of uv_timer's causing pmproxy crashes

When multiple worker threads simultaneously call uv_timer_start/stop on
timers belonging to the same event loop, they concurrently modify the
shared heap data structure without synchronization. This causes segmentation
faults in pmproxy.
This fix:

  • removes the unsafe calls uv_timer_start() and uv_update_time() from worker threads
  • uses thread-safe time tracking uv_hrtime()
  • moves timer logic to the event loop thread

Fix of expansion of delta indom causing pmproxy crashes

When expanding a delta indom into a full indom, the function copies name
pointers by reference (shallow copy) from the previous full indom into
the new namelist. This creates a mismatch when two different memory
allocation schemes coexist in the same hashindom chain leading to free()
of invalid pointers causing SIGABRT.
This fix: is replacing shallow copies of pointers with strdup() and sets
PMLID_NAMES to ensure the expanded namelist is an independent allocation
that can be safely freed.

kurik added 2 commits March 10, 2026 11:28
When multiple worker threads simultaneously call uv_timer_start/stop on
timers belonging to the same event loop, they concurrently modify the
shared heap data structure without synchronization. This causes segmentation
faults in pmproxy.

This fix
- removes the unsafe calls uv_timer_start() and uv_update_time() from worker threads
- uses thread-safe time tracking uv_hrtime()
- moves timer logic to the event loop thread
When expanding a delta indom into a full indom, the function copies name
pointers by reference (shallow copy) from the previous full indom into
the new namelist. This creates a mismatch when two different memory
allocation schemes coexist in the same hashindom chain leading to free()
of invalid pointers causing SIGABRT.

This fix is replacing shallow copies of pointers with strdup() and sets
PMLID_NAMES to ensure the expanded namelist is an independent allocation
that can be safely freed.
@kurik
Copy link
Contributor Author

kurik commented Mar 10, 2026

fedora-rawhide and ubuntu2204 pipelines failed due to issues starting the container - not related to the proposed fix.

However the changes here are quite significant in the way how context timers are handled, so I would appreciate some review from @kmcdonell or @natoscott . Thanks guys in advance.

Copy link
Member

@natoscott natoscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll defer to @kmcdonell comment on the libpcp namelist change, but the libpcp_web changes make sense. We may need a similar hrtimer transition in pmproxy/src too?

Copy link
Member

@kmcdonell kmcdonell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kurik Thanks for the triage here. I'll defer to @natoscott for the libpcp_web changes, but with one small improvement I think the libpcp ones are OK.

Do you have a reproducer for this? I'm guessing not, it is just the corpses from pmproxy that are scattered on the ground.

I've managed to build a new QA app and test (qa/src/undelta.c et al, commit incoming) that dies spectacularly with the unmodified logmeta.c, but passes with (my version) of the changes in place.

I'm a little surprised that this has not bitten before, but I think the reason is that the common code path below pmNewContext() that most archive consuming tools are using ends up allocating the components of all the __pmLogInDom structs in a way that avoids the problem you've identified, pmproxy on the other hand is reading metadata records directly from disk for the "fail -f" mode of ingestion and is probably building the __pmLogInDom structs in a slightly different way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect the proposed changes are fixing one problem and introducing a potential memory leak.

On review of the code, I believe __pmLogUndeltaInDom() should honour the PMLID_NAMES bit in the alloc field of the delta __pmLogInDom.

So a slightly different change for the last strdup() would be along the lines:

if (tidp->alloc & PMLID_NAMES)
    namelist[k] = didp->namelist[j];
else
    namelist[k] = strdup(didp->namelist[j]);

kmcdonell added a commit to kmcdonell/pcp that referenced this pull request Mar 15, 2026
Hand craft some __pmLogInDom structures and then let __pmLogUndeltaInDom()
chew on 'em.

qa/1664 runs undelta, and qa/1655 is the valgrind dual.

Also related to https://github.com/performancecopilot/pcp/pull/2525/changes.

Fails spectacularly without the fix from PR performancecopilot#2525.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants