Ensure workers get killed on unregister call#942
Ensure workers get killed on unregister call#942maheshambule wants to merge 4 commits intoawslabs:masterfrom
Conversation
|
@maheshambule , Can you please do a small write regarding the testing that was done for this PR. Please add some context / behavior before the fix / behavior after the fix with steps & logs for both the cases. |
ayushsengupta1991
left a comment
There was a problem hiding this comment.
Tested that the worker processes actually get destroyed on an unregister call.
|
I'm relatively sure I'm seeing this behavior as well - memory used on invocation/handle doesn't seem to be garbage collected and I'm quickly hitting 100% even on relatively large 3 x 18xlarge instances in the context of Sagemaker multi-model endpoints. The changes from @maheshambule were approved but never merged - is that because it needed documentation? @ayushsengupta1991 @dhanainme I'm happy to write something if it would get the code accepted upstream |
|
(The CI failure isn't accessible anymore or I'd go in and comment.) Hoping to get this bumped / helped. Thanks, |
Issue #, if available:
The orphan processes get created when you fire multiple register and unregister calls on same model one after another.
The orphan worker processes hogs the system memory.
Description of changes:
Testing done:
To run CI tests on your changes refer README.md
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.